Обсуждение: How would you store read/unread topic status?

Поиск
Список
Период
Сортировка

How would you store read/unread topic status?

От
Mathieu Nebra
Дата:
Hi all,

I'm running a quite large website which has its own forums. They are
currently heavily used and I'm getting performance issues. Most of them
are due to repeated UPDATE queries on a "flags" table.

This "flags" table has more or less the following fields:

UserID - TopicID - LastReadAnswerID

The flags table keeps track of every topic a member has visited and
remembers the last answer which was posted at this moment. It allows the
user to come back a few days after and immediately jump to the last
answer he has not read.

My problem is that everytime a user READS a topic, it UPDATES this flags
table to remember he has read it. This leads to multiple updates at the
same time on the same table, and an update can take a few seconds. This
is not acceptable for my users.

Question: what is the general rule of thumb here? How would you store
this information?

Thanks a lot in advance.
Mathieu.

Re: How would you store read/unread topic status?

От
Andres Freund
Дата:
On 06/23/2009 01:12 PM, Mathieu Nebra wrote:
> I'm running a quite large website which has its own forums. They are
> currently heavily used and I'm getting performance issues. Most of them
> are due to repeated UPDATE queries on a "flags" table.
>
> This "flags" table has more or less the following fields:
>
> UserID - TopicID - LastReadAnswerID
>
> The flags table keeps track of every topic a member has visited and
> remembers the last answer which was posted at this moment. It allows the
> user to come back a few days after and immediately jump to the last
> answer he has not read.
> My problem is that everytime a user READS a topic, it UPDATES this flags
> table to remember he has read it. This leads to multiple updates at the
> same time on the same table, and an update can take a few seconds. This
> is not acceptable for my users.
Have you analyzed why it takes that long? Determining that is the first
step of improving the current situation...

My first guess would be, that your disks cannot keep up with the number
of syncronous writes/second. Do you know how many transactions with
write access you have? Guessing from your description you do at least
one write for every page hit on your forum.

With the default settings every transaction needs to wait for io at the
end - to ensure transactional semantics.
Depending on your disk the number of possible writes/second is quite low
- a normal SATA disk with 7200rpm can satisfy something around 130
syncronous writes per second. Which is the upper limit on writing
transactions per second.
What disks do you have?

On which OS are you? If you are on linux you could use iostat to get
some relevant statistics like:
iostat -x /path/to/device/the/database/resides/on 2 10

That gives you 10 statistics over periods of 2 seconds.


Depending on those results there are numerous solutions to that problem...

> Question: what is the general rule of thumb here? How would you store
> this information?
The problem here is, that every read access writes to disk - that is not
going to scale very well.
One possible solution is to use something like memcached to store the
last read post in memory and periodically write it into the database.


Which pg version are you using?


Andres

Re: How would you store read/unread topic status?

От
Alexander Staubo
Дата:
On Tue, Jun 23, 2009 at 1:12 PM, Mathieu Nebra<mateo21@siteduzero.com> wrote:
> This "flags" table has more or less the following fields:
>
> UserID - TopicID - LastReadAnswerID

We are doing pretty much same thing.

> My problem is that everytime a user READS a topic, it UPDATES this flags
> table to remember he has read it. This leads to multiple updates at the
> same time on the same table, and an update can take a few seconds. This
> is not acceptable for my users.

First of all, and I'm sure you thought of this, an update isn't needed
every time a user reads a topic; only when there are new answers that
need to be marked as read. So an "update ... where last_read_answer_id
< ?" should avoid the need for an update.

(That said, I believe PostgreSQL diffs tuple updates, so in practice
PostgreSQL might not be writing anything if you run an "update" with
the same value. I will let someone more intimate with the internal
details of updates to comment on this.)

Secondly, an update should not take "a few seconds". You might want to
investigate this part before you turn to further optimizations.

In our application we defer the updates to a separate asynchronous
process using a simple queue mechanism, but in our case, we found that
the updates are fast enough (in the order of a few milliseconds) not
to warrant batching them into single transactions.

A.

Re: How would you store read/unread topic status?

От
Andres Freund
Дата:
On 06/23/2009 02:37 PM, Alexander Staubo wrote:
> (That said, I believe PostgreSQL diffs tuple updates, so in practice
> PostgreSQL might not be writing anything if you run an "update" with
> the same value. I will let someone more intimate with the internal
> details of updates to comment on this.)
No, it does not do that by default.
You can write a trigger to do that though - and there is one packaged
with the core version in the upcoming 8.4 version.

Andres

Re: How would you store read/unread topic status?

От
justin
Дата:
Mathieu Nebra wrote:
> Hi all,
>
> I'm running a quite large website which has its own forums. They are
> currently heavily used and I'm getting performance issues. Most of them
> are due to repeated UPDATE queries on a "flags" table.
>
> This "flags" table has more or less the following fields:
>
> UserID - TopicID - LastReadAnswerID
>
> The flags table keeps track of every topic a member has visited and
> remembers the last answer which was posted at this moment. It allows the
> user to come back a few days after and immediately jump to the last
> answer he has not read.
>
> My problem is that everytime a user READS a topic, it UPDATES this flags
> table to remember he has read it. This leads to multiple updates at the
> same time on the same table, and an update can take a few seconds. This
> is not acceptable for my users.
>
> Question: what is the general rule of thumb here? How would you store
> this information?
>
> Thanks a lot in advance.
> Mathieu.
>
>
Sounds like the server is getting IO bound by checkpoints causing flush
to disk causing a IO to become bound.

http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
there is some 8.0-8.2 tuning ideas in this link.

Yes this is acceptable way to store such information.

What is the PG version.  performance tuning  options are different
depending on the version???
http://wiki.postgresql.org/wiki/Performance_Optimization
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm

Re: How would you store read/unread topic status?

От
Nikolas Everett
Дата:


In our application we defer the updates to a separate asynchronous
process using a simple queue mechanism, but in our case, we found that
the updates are fast enough (in the order of a few milliseconds) not
to warrant batching them into single transactions.

We do a very similar trick for another sort of data and its worked wonders for performance.  We had more frequent updates to fewer rows, though.  If you happen to be using Java, HashMap and TreeMap are perfect for this because they are reentrant so you don't have to worry about synchronizing your sweeper with your web page activities.  As an added bonus, when you do this trick you don't have to query this information from the database unless you have a cache miss.

Re: How would you store read/unread topic status?

От
Matthew Wakeling
Дата:
On Tue, 23 Jun 2009, Nikolas Everett wrote:
> If you happen to be using Java, HashMap and TreeMap are perfect for this
> because they are reentrant so you don't have to worry about
> synchronizing your sweeper with your web page activities.

See the note in http://java.sun.com/javase/6/docs/api/java/util/TreeMap.html

> "Note that this implementation is not synchronized."

If you have multiple threads accessing a TreeMap or HashMap, then they
must be synchronised to ensure that only one thread at a time is accessing
it. Otherwise, you may suffer severe data loss and possibly even JVM
crashes. Perhaps you meant java.util.concurrent.ConcurrentHashMap?

Be very careful.

Matthew

--
 Now, you would have thought these coefficients would be integers, given that
 we're working out integer results. Using a fraction would seem really
 stupid. Well, I'm quite willing to be stupid here - in fact, I'm going to
 use complex numbers.                    -- Computer Science Lecturer

Re: How would you store read/unread topic status?

От
Mathieu Nebra
Дата:
> On 06/23/2009 01:12 PM, Mathieu Nebra wrote:
>> >> I'm running a quite large website which has its own forums. They are
>> >> currently heavily used and I'm getting performance issues. Most of
them
>> >> are due to repeated UPDATE queries on a "flags" table.
>> >>
>> >> This "flags" table has more or less the following fields:
>> >>
>> >> UserID - TopicID - LastReadAnswerID
>> >>
>> >> The flags table keeps track of every topic a member has visited and
>> >> remembers the last answer which was posted at this moment. It
allows the
>> >> user to come back a few days after and immediately jump to the last
>> >> answer he has not read.
>> >> My problem is that everytime a user READS a topic, it UPDATES this
flags
>> >> table to remember he has read it. This leads to multiple updates
at the
>> >> same time on the same table, and an update can take a few seconds.
This
>> >> is not acceptable for my users.
> > Have you analyzed why it takes that long? Determining that is the first
> > step of improving the current situation...
> >
> > My first guess would be, that your disks cannot keep up with the number
> > of syncronous writes/second. Do you know how many transactions with
> > write access you have? Guessing from your description you do at least
> > one write for every page hit on your forum.

I don't know how many writes/s Pgsql can handle on my server, but I
first suspected that it was good practice to avoid unnecessary writes.

I do 1 write/page for every connected user on the forums.
I do the same on another part of my website to increment the number of
page views (this was not part of my initial question but it is very close).

> >
> > With the default settings every transaction needs to wait for io at the
> > end - to ensure transactional semantics.
> > Depending on your disk the number of possible writes/second is quite low
> > - a normal SATA disk with 7200rpm can satisfy something around 130
> > syncronous writes per second. Which is the upper limit on writing
> > transactions per second.
> > What disks do you have?

We have 2 SAS RAID 0 15000rpm disks.

> >
> > On which OS are you? If you are on linux you could use iostat to get
> > some relevant statistics like:
> > iostat -x /path/to/device/the/database/resides/on 2 10
> >
> > That gives you 10 statistics over periods of 2 seconds.
> >
> >
> > Depending on those results there are numerous solutions to that
problem...

Here it is:

$ iostat -x /dev/sda 2 10
Linux 2.6.18-6-amd64 (scratchy)     23.06.2009

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          18,02    0,00   12,87   13,13    0,00   55,98

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0,94   328,98 29,62 103,06   736,58  6091,14    51,46
    0,04    0,25   0,04   0,51

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          39,65    0,00   48,38    2,00    0,00    9,98

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0,00     0,00 10,00 78,00   516,00  1928,00    27,77
   6,44   73,20   2,75  24,20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          40,15    0,00   48,13    2,24    0,00    9,48

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0,00     0,00  6,47 100,50   585,07  2288,56    26,87
   13,00  121,56   3,00  32,04

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          45,14    0,00   45,64    6,73    0,00    2,49

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               1,00     0,00 34,00 157,50  1232,00  3904,00    26,82
   26,64  139,09   3,03  58,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          46,25    0,00   49,25    3,50    0,00    1,00

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0,00     0,00 27,00 173,00   884,00  4224,00    25,54
   24,46  122,32   3,00  60,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          44,42    0,00   47,64    2,23    0,00    5,71

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0,00     0,00 15,42 140,30   700,50  3275,62    25,53
   17,94  115,21   2,81  43,78

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          41,75    0,00   48,50    2,50    0,00    7,25

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0,50     0,00 21,11 116,08   888,44  2472,36    24,50
   12,62   91,99   2,55  34,97

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          44,03    0,00   46,27    2,99    0,00    6,72

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               9,00     0,00 10,00 119,00   484,00  2728,00    24,90
   15,15  117,47   2,70  34,80

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          36,91    0,00   51,37    2,49    0,00    9,23

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0,99     0,00 14,78 136,45   390,15  2825,62    21,26
   21,86  144,52   2,58  39,01

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          38,75    0,00   48,75    1,00    0,00   11,50

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0,00     0,00  7,54 67,34   377,89  1764,82    28,62
   5,38   71,89   2,95  22,11



> >
>> >> Question: what is the general rule of thumb here? How would you store
>> >> this information?
> > The problem here is, that every read access writes to disk - that is not
> > going to scale very well.

That's what I thought.

> > One possible solution is to use something like memcached to store the
> > last read post in memory and periodically write it into the database.
> >

We're starting using memcached. But how would you "periodically" write
that to database?

> >
> > Which pg version are you using?

I should have mentionned that before sorry: PostgreSQL 8.2

Thanks a lot!



Andres Freund a écrit :
> On 06/23/2009 01:12 PM, Mathieu Nebra wrote:
>> I'm running a quite large website which has its own forums. They are
>> currently heavily used and I'm getting performance issues. Most of them
>> are due to repeated UPDATE queries on a "flags" table.
>>
>> This "flags" table has more or less the following fields:
>>
>> UserID - TopicID - LastReadAnswerID
>>
>> The flags table keeps track of every topic a member has visited and
>> remembers the last answer which was posted at this moment. It allows the
>> user to come back a few days after and immediately jump to the last
>> answer he has not read.
>> My problem is that everytime a user READS a topic, it UPDATES this flags
>> table to remember he has read it. This leads to multiple updates at the
>> same time on the same table, and an update can take a few seconds. This
>> is not acceptable for my users.
> Have you analyzed why it takes that long? Determining that is the first
> step of improving the current situation...
>
> My first guess would be, that your disks cannot keep up with the number
> of syncronous writes/second. Do you know how many transactions with
> write access you have? Guessing from your description you do at least
> one write for every page hit on your forum.
>
> With the default settings every transaction needs to wait for io at the
> end - to ensure transactional semantics.
> Depending on your disk the number of possible writes/second is quite low
> - a normal SATA disk with 7200rpm can satisfy something around 130
> syncronous writes per second. Which is the upper limit on writing
> transactions per second.
> What disks do you have?
>
> On which OS are you? If you are on linux you could use iostat to get
> some relevant statistics like:
> iostat -x /path/to/device/the/database/resides/on 2 10
>
> That gives you 10 statistics over periods of 2 seconds.
>
>
> Depending on those results there are numerous solutions to that problem...
>
>> Question: what is the general rule of thumb here? How would you store
>> this information?
> The problem here is, that every read access writes to disk - that is not
> going to scale very well.
> One possible solution is to use something like memcached to store the
> last read post in memory and periodically write it into the database.
>
>
> Which pg version are you using?
>
>
> Andres

Re: How would you store read/unread topic status?

От
Mathieu Nebra
Дата:
Alexander Staubo a écrit :
> On Tue, Jun 23, 2009 at 1:12 PM, Mathieu Nebra<mateo21@siteduzero.com> wrote:
>> This "flags" table has more or less the following fields:
>>
>> UserID - TopicID - LastReadAnswerID
>
> We are doing pretty much same thing.
>
>> My problem is that everytime a user READS a topic, it UPDATES this flags
>> table to remember he has read it. This leads to multiple updates at the
>> same time on the same table, and an update can take a few seconds. This
>> is not acceptable for my users.
>
> First of all, and I'm sure you thought of this, an update isn't needed
> every time a user reads a topic; only when there are new answers that
> need to be marked as read. So an "update ... where last_read_answer_id
> < ?" should avoid the need for an update.

We don't work that way. We just "remember" he has read these answers and
then we can tell him "there are no new messages for you to read".
So we just need to write what he has read when he reads it.

>
> (That said, I believe PostgreSQL diffs tuple updates, so in practice
> PostgreSQL might not be writing anything if you run an "update" with
> the same value. I will let someone more intimate with the internal
> details of updates to comment on this.)
>
> Secondly, an update should not take "a few seconds". You might want to
> investigate this part before you turn to further optimizations.

Yes, I know there is a problem but I don't know if I am competent enough
to tune PostgreSQL for that. It can take a while to understand the
problem, and I'm not sure I'll have the time for that.

I am, however, opened to suggestions. Maybe I'm doing something wrong
somewhere.

>
> In our application we defer the updates to a separate asynchronous
> process using a simple queue mechanism, but in our case, we found that
> the updates are fast enough (in the order of a few milliseconds) not
> to warrant batching them into single transactions.

A few milliseconds would be cool.
In fact, defering to another process is a good idea, but I'm not sure if
it is easy to implement. It would be great to have some sort of UPDATE
... LOW PRIORITY to make the request non blocking.

Thanks.

Re: How would you store read/unread topic status?

От
Robert Haas
Дата:
>> > Which pg version are you using?
>
> I should have mentionned that before sorry: PostgreSQL 8.2

I think there is an awful lot of speculation on this thread about what
your problem is without anywhere near enough investigation.  A couple
of seconds for an update is a really long time, unless your server is
absolutely slammed, in which case probably everything is taking a long
time.  We need to get some more information on what is happening here.
 Approximately how many requests per second are you servicing?  Also,
can you:

1. Run EXPLAIN ANALYZE on a representative UPDATE statement and post
the exact query and the output.

2. Run VACUUM VERBOSE on your database and send the last 10 lines or
so of the output.

3. Try your UPDATE statement at a low-traffic time of day and see
whether it's faster than it is at a high-traffic time of day, and by
how much.  Or dump your database and reload it on a dev server and see
how fast it runs there.

...Robert

Re: How would you store read/unread topic status?

От
Guillaume Cottenceau
Дата:
Mathieu Nebra <mateo21 'at' siteduzero.com> writes:

>> (That said, I believe PostgreSQL diffs tuple updates, so in practice
>> PostgreSQL might not be writing anything if you run an "update" with
>> the same value. I will let someone more intimate with the internal
>> details of updates to comment on this.)
>>
>> Secondly, an update should not take "a few seconds". You might want to
>> investigate this part before you turn to further optimizations.
>
> Yes, I know there is a problem but I don't know if I am competent enough
> to tune PostgreSQL for that. It can take a while to understand the
> problem, and I'm not sure I'll have the time for that.

Short story: run the query in psql prepending EXPLAIN ANALYZE in
front of it and copy-paste the output in reply to that list.

Long story: there are a lot of interesting material in PG
official documentation about optimization. It is very worth a
read but it's longer than a short story. In my experience,
database performance can be degraded orders of magnitude if not
configured properly.

> I am, however, opened to suggestions. Maybe I'm doing something wrong
> somewhere.
>
>>
>> In our application we defer the updates to a separate asynchronous
>> process using a simple queue mechanism, but in our case, we found that
>> the updates are fast enough (in the order of a few milliseconds) not
>> to warrant batching them into single transactions.
>
> A few milliseconds would be cool.

That also depends on the query. If your update selects rows not
according to an index you're going to be in trouble if the table
hosts a lot of data, but that's fair. So you might just need an
index. That might also be related to row bloat. Your query with
EXPLAIN ANALYZE would tell what postgres does (if it uses an
index or not).

> In fact, defering to another process is a good idea, but I'm not sure if
> it is easy to implement. It would be great to have some sort of UPDATE

No article on the site du zéro explaining how to implement
producer-consumers? :) But that must really be thought before
implementing. It's not worth piling queries in memory because it
will create other problems if queries are produced faster than
consumed in the long run.

--
Guillaume Cottenceau

Re: How would you store read/unread topic status?

От
Andres Freund
Дата:
On 06/23/2009 04:54 PM, Mathieu Nebra wrote:
>> On 06/23/2009 01:12 PM, Mathieu Nebra wrote:
>>>>> I'm running a quite large website which has its own forums.
>>>>> They are currently heavily used and I'm getting performance
>>>>> issues. Most of
> them
>>>>> are due to repeated UPDATE queries on a "flags" table.
>>>>>
>>>>> This "flags" table has more or less the following fields:
>>>>>
>>>>> UserID - TopicID - LastReadAnswerID
>>>>>
>>>>> The flags table keeps track of every topic a member has
>>>>> visited and remembers the last answer which was posted at
>>>>> this moment. It allows the user to come back a few days
>>>>> after and immediately jump to the last answer he has not
>>>>> read. My problem is that everytime a user READS a topic, it
>>>>> UPDATES this flags table to remember he has read it. This
>>>>> leads to multiple updates at the same time on the same table,
>>>>> and an update can take a few seconds. This is not acceptable
>>>>> for my users.
>>> Have you analyzed why it takes that long? Determining that is the
>>> first step of improving the current situation...
>>>
>>> My first guess would be, that your disks cannot keep up with the
>>>  number of syncronous writes/second. Do you know how many
>>> transactions with write access you have? Guessing from your
>>> description you do at least one write for every page hit on your
>>>  forum.
>
> I don't know how many writes/s Pgsql can handle on my server, but I
> first suspected that it was good practice to avoid unnecessary
> writes.
It surely is.

> I do 1 write/page for every connected user on the forums. I do the
> same on another part of my website to increment the number of page
> views (this was not part of my initial question but it is very
> close).
That even more cries for some in-memory-caching.

>>> On which OS are you? If you are on linux you could use iostat to
>>>  get some relevant statistics like: iostat -x
>>> /path/to/device/the/database/resides/on 2 10
>>>
>>> That gives you 10 statistics over periods of 2 seconds.
>>>
>>>
>>> Depending on those results there are numerous solutions to that
> problem...
>
> Here it is:
>
> $ iostat -x /dev/sda 2 10 Linux 2.6.18-6-amd64 (scratchy) 23.06.2009
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle 18,02 0,00
> 12,87   13,13    0,00   55,98
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util sda               0,94
> 328,98 29,62 103,06   736,58  6091,14    51,46 0,04    0,25   0,04
> 0,51
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle 39,65 0,00
> 48,38    2,00    0,00    9,98
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util sda               0,00 0,00
> 10,00 78,00   516,00  1928,00    27,77 6,44   73,20   2,75 24,20
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle 40,15 0,00
> 48,13    2,24    0,00    9,48
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util sda               0,00 0,00
> 6,47 100,50   585,07  2288,56    26,87 13,00  121,56   3,00 32,04
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle 45,14 0,00
> 45,64    6,73    0,00    2,49
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util sda               1,00 0,00
> 34,00 157,50  1232,00  3904,00    26,82 26,64  139,09   3,03 58,00
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle 46,25 0,00
> 49,25    3,50    0,00    1,00
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util sda               0,00 0,00
> 27,00 173,00   884,00  4224,00    25,54 24,46  122,32   3,00 60,00
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle 44,42 0,00
> 47,64    2,23    0,00    5,71
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util sda               0,00 0,00
> 15,42 140,30   700,50  3275,62    25,53 17,94  115,21   2,81 43,78
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle 41,75 0,00
> 48,50    2,50    0,00    7,25
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util sda               0,50 0,00
> 21,11 116,08   888,44  2472,36    24,50 12,62   91,99   2,55 34,97
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle 44,03 0,00
> 46,27    2,99    0,00    6,72
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util sda               9,00 0,00
> 10,00 119,00   484,00  2728,00    24,90 15,15  117,47   2,70 34,80
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle 36,91 0,00
> 51,37    2,49    0,00    9,23
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util sda               0,99 0,00
> 14,78 136,45   390,15  2825,62    21,26 21,86  144,52   2,58 39,01
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle 38,75 0,00
> 48,75    1,00    0,00   11,50
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util sda               0,00 0,00
> 7,54 67,34   377,89  1764,82    28,62 5,38   71,89   2,95 22,11
You see that your average wait time 'await' is quite high. That
indicates some contention. You have somewhere between 50-200
writes/second, so you may be maxing out your disk (depending on your
config those writes may mainly go to one disk at a time).


>>> One possible solution is to use something like memcached to store
>>> the last read post in memory and periodically write it into the
>>> database.
> We're starting using memcached. But how would you "periodically"
> write that to database?
Where do you see the problem?

>>> Which pg version are you using?
> I should have mentionned that before sorry: PostgreSQL 8.2
I definitely would consider upgrading to 8.3 - even without any config
changes it might bring quite some improvement.

But mainly it would allow you to use "asynchronous commit" - which could
possibly increase your throughput tremendously.
It has the drawback that you possibly loose async transactions in case
of crash - but that doesn't sound too bad for your use case (use it only
in the transactions where it makes sense).


But all of that does not explain the issue sufficiently - you should not
get that slow updates.
I would suggest you configure "log_min_statement_duration" to get the
slower queries.
You then should run those slow statements using 'EXPLAIN ANALYZE' to see
where the time is spent.

How are you vacuuming?


Andres

Re: How would you store read/unread topic status?

От
Robert Haas
Дата:
>>>> Which pg version are you using?
>>
>> I should have mentionned that before sorry: PostgreSQL 8.2
>
> I definitely would consider upgrading to 8.3 - even without any config
> changes it might bring quite some improvement.
>
> But mainly it would allow you to use "asynchronous commit" - which could
> possibly increase your throughput tremendously.

HOT can potentitally help a lot for this workload, too, if the columns
being updated are not indexed.

...Robert

Re: How would you store read/unread topic status?

От
Grzegorz Jaśkiewicz
Дата:
not better just to store last time user visited the topic ? or forum in general, and compare that ?

Re: How would you store read/unread topic status?

От
Mathieu Nebra
Дата:
Robert Haas a écrit :
>>>> Which pg version are you using?
>> I should have mentionned that before sorry: PostgreSQL 8.2
>
> I think there is an awful lot of speculation on this thread about what
> your problem is without anywhere near enough investigation.  A couple
> of seconds for an update is a really long time, unless your server is
> absolutely slammed, in which case probably everything is taking a long
> time.  We need to get some more information on what is happening here.

You're right, I'll give you the information you need.

>  Approximately how many requests per second are you servicing?  Also,

How can I extract this information from the database? I know how to use
pg_stat_user_tables. My table has:

seq_tup_read
133793491714

idx_scan
12408612540

idx_tup_fetch
41041660903

n_tup_ins
14700038

n_tup_upd
6698236

n_tup_del
15990670

> can you:
>
> 1. Run EXPLAIN ANALYZE on a representative UPDATE statement and post
> the exact query and the output.

"Index Scan using prj_frm_flg_pkey on prj_frm_flg  (cost=0.00..8.58
rows=1 width=18)"
"  Index Cond: ((flg_mid = 3) AND (flg_sid = 123764))"

This time it only took 54ms, but maybe it's already a lot.


>
> 2. Run VACUUM VERBOSE on your database and send the last 10 lines or
> so of the output.

It's not very long, I can give you the whole log:

INFO:  vacuuming "public.prj_frm_flg"INFO:  scanned index
"prj_frm_flg_pkey" to remove 74091 row versions
DETAIL:  CPU 0.15s/0.47u sec elapsed 53.10 sec.INFO:  scanned index
"flg_fav" to remove 74091 row versions
DETAIL:  CPU 0.28s/0.31u sec elapsed 91.82 sec.INFO:  scanned index
"flg_notif" to remove 74091 row versions
DETAIL:  CPU 0.36s/0.37u sec elapsed 80.75 sec.INFO:  scanned index
"flg_post" to remove 74091 row versions
DETAIL:  CPU 0.31s/0.37u sec elapsed 115.86 sec.INFO:  scanned index
"flg_no_inter" to remove 74091 row versions
DETAIL:  CPU 0.34s/0.33u sec elapsed 68.96 sec.INFO:  "prj_frm_flg":
removed 74091 row versions in 5979 pages
DETAIL:  CPU 0.29s/0.34u sec elapsed 100.37 sec.INFO:  index
"prj_frm_flg_pkey" now contains 1315895 row versions in 7716 pages
DETAIL:  63153 index row versions were removed.
672 index pages have been deleted, 639 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_fav" now contains
1315895 row versions in 18228 pages
DETAIL:  73628 index row versions were removed.
21 index pages have been deleted, 16 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_notif" now
contains 1315895 row versions in 18179 pages
DETAIL:  73468 index row versions were removed.
22 index pages have been deleted, 13 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_post" now
contains 1315895 row versions in 18194 pages
DETAIL:  73628 index row versions were removed.
30 index pages have been deleted, 23 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_no_inter" now
contains 1315895 row versions in 8596 pages
DETAIL:  73628 index row versions were removed.
13 index pages have been deleted, 8 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  "prj_frm_flg": found 74091
removable, 1315895 nonremovable row versions in 10485 pages
DETAIL:  326 dead row versions cannot be removed yet.
There were 253639 unused item pointers.
10431 pages contain useful free space.
0 pages are entirely empty.
CPU 1.91s/2.28u sec elapsed 542.75 sec.

Total: 542877 ms.

>
> 3. Try your UPDATE statement at a low-traffic time of day and see
> whether it's faster than it is at a high-traffic time of day, and by
> how much.  Or dump your database and reload it on a dev server and see
> how fast it runs there.

It took 4ms.

Re: How would you store read/unread topic status?

От
Robert Haas
Дата:
On Tue, Jun 23, 2009 at 11:50 AM, Mathieu Nebra<mateo21@siteduzero.com> wrote:
>>  Approximately how many requests per second are you servicing?  Also,
>
> How can I extract this information from the database? I know how to use
> pg_stat_user_tables. My table has:

I was thinking you might look at your httpd logs.  Not sure how to get
it otherwise.

>> can you:
>>
>> 1. Run EXPLAIN ANALYZE on a representative UPDATE statement and post
>> the exact query and the output.
>
> "Index Scan using prj_frm_flg_pkey on prj_frm_flg  (cost=0.00..8.58
> rows=1 width=18)"
> "  Index Cond: ((flg_mid = 3) AND (flg_sid = 123764))"
>
> This time it only took 54ms, but maybe it's already a lot.

That looks like EXPLAIN, not EXPLAIN ANALYZE.  And can we also have the query?

>> 2. Run VACUUM VERBOSE on your database and send the last 10 lines or
>> so of the output.
>
> It's not very long, I can give you the whole log:
>
> INFO:  vacuuming "public.prj_frm_flg"INFO:  scanned index
> "prj_frm_flg_pkey" to remove 74091 row versions
> DETAIL:  CPU 0.15s/0.47u sec elapsed 53.10 sec.INFO:  scanned index
> "flg_fav" to remove 74091 row versions
> DETAIL:  CPU 0.28s/0.31u sec elapsed 91.82 sec.INFO:  scanned index
> "flg_notif" to remove 74091 row versions
> DETAIL:  CPU 0.36s/0.37u sec elapsed 80.75 sec.INFO:  scanned index
> "flg_post" to remove 74091 row versions
> DETAIL:  CPU 0.31s/0.37u sec elapsed 115.86 sec.INFO:  scanned index
> "flg_no_inter" to remove 74091 row versions
> DETAIL:  CPU 0.34s/0.33u sec elapsed 68.96 sec.INFO:  "prj_frm_flg":
> removed 74091 row versions in 5979 pages
> DETAIL:  CPU 0.29s/0.34u sec elapsed 100.37 sec.INFO:  index
> "prj_frm_flg_pkey" now contains 1315895 row versions in 7716 pages
> DETAIL:  63153 index row versions were removed.
> 672 index pages have been deleted, 639 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_fav" now contains
> 1315895 row versions in 18228 pages
> DETAIL:  73628 index row versions were removed.
> 21 index pages have been deleted, 16 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_notif" now
> contains 1315895 row versions in 18179 pages
> DETAIL:  73468 index row versions were removed.
> 22 index pages have been deleted, 13 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_post" now
> contains 1315895 row versions in 18194 pages
> DETAIL:  73628 index row versions were removed.
> 30 index pages have been deleted, 23 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_no_inter" now
> contains 1315895 row versions in 8596 pages
> DETAIL:  73628 index row versions were removed.
> 13 index pages have been deleted, 8 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  "prj_frm_flg": found 74091
> removable, 1315895 nonremovable row versions in 10485 pages
> DETAIL:  326 dead row versions cannot be removed yet.
> There were 253639 unused item pointers.
> 10431 pages contain useful free space.
> 0 pages are entirely empty.
> CPU 1.91s/2.28u sec elapsed 542.75 sec.
>
> Total: 542877 ms.

Is that just for the one table?  I meant a database-wide VACUUM
VERBOSE, so you can see if you've blown out your free-space map.

>> 3. Try your UPDATE statement at a low-traffic time of day and see
>> whether it's faster than it is at a high-traffic time of day, and by
>> how much.  Or dump your database and reload it on a dev server and see
>> how fast it runs there.
>
> It took 4ms.

Was that at a low traffic time of day, or on a different server?

...Robert

Re: How would you store read/unread topic status?

От
Mike
Дата:
So your update doesn't take long to run during off-peak times, so
basically your options are:

1. Optimize your postgresql.conf settings or upgrade to the latest
version of PostgreSQL.

2. Redesign your forum code so it can scale better.

3. Upgrade your servers hardware as it may be overloaded.

I would probably attack those in the order I described.

As far as redesigning your forum code, keep in mind that in PostgreSQL
an update is basically a select, delete, insert in a single statement.
First it needs to find the rows to update, it marks the rows for
deletion (which vacuum later does) and inserts a new row. So updates
can be quite expensive.

In SOME situations, it can be faster to do inserts only, and modify
your select query to get just the data you need, for example:

Rather then an update like this:

update <table> set LastReadAnswerID = <value> where UserID = <value>
AND TopicID = <value>

You could do this instead:

insert into <table> VALUES(<user_id>,<topic_id>,<last_read_answer_id>)

Then just modify your select statement slightly to get the last
inserted row:

select * from <table> where user_id = <value> AND topic_id = <value>
order by LastReadAnswerID DESC LIMIT 1

This makes your select statement slightly more expensive but your
insert statement pretty much as cheap as possible. Since its much
easier to cache select results you could easily wrap some caching
mechanism around your select query to reduce the load there too.

Then using a task scheduler like cron simply clear out old rows from the
table you insert into every minute, 5 minutes, hour, day, whatever makes
most sense to keep the select queries fast.

A memcached solution would probably be much better, but its also likely
much more involved to do.


On Tue, 23 Jun 2009 17:50:50 +0200
Mathieu Nebra <mateo21@siteduzero.com> wrote:

> Robert Haas a écrit :
> >>>> Which pg version are you using?
> >> I should have mentionned that before sorry: PostgreSQL 8.2
> >
> > I think there is an awful lot of speculation on this thread about
> > what your problem is without anywhere near enough investigation.  A
> > couple of seconds for an update is a really long time, unless your
> > server is absolutely slammed, in which case probably everything is
> > taking a long time.  We need to get some more information on what
> > is happening here.
>
> You're right, I'll give you the information you need.
>
> >  Approximately how many requests per second are you servicing?
> > Also,
>
> How can I extract this information from the database? I know how to
> use pg_stat_user_tables. My table has:
>
> seq_tup_read
> 133793491714
>
> idx_scan
> 12408612540
>
> idx_tup_fetch
> 41041660903
>
> n_tup_ins
> 14700038
>
> n_tup_upd
> 6698236
>
> n_tup_del
> 15990670
>
> > can you:
> >
> > 1. Run EXPLAIN ANALYZE on a representative UPDATE statement and post
> > the exact query and the output.
>
> "Index Scan using prj_frm_flg_pkey on prj_frm_flg  (cost=0.00..8.58
> rows=1 width=18)"
> "  Index Cond: ((flg_mid = 3) AND (flg_sid = 123764))"
>
> This time it only took 54ms, but maybe it's already a lot.
>
>
> >
> > 2. Run VACUUM VERBOSE on your database and send the last 10 lines or
> > so of the output.
>
> It's not very long, I can give you the whole log:
>
> INFO:  vacuuming "public.prj_frm_flg"INFO:  scanned index
> "prj_frm_flg_pkey" to remove 74091 row versions
> DETAIL:  CPU 0.15s/0.47u sec elapsed 53.10 sec.INFO:  scanned index
> "flg_fav" to remove 74091 row versions
> DETAIL:  CPU 0.28s/0.31u sec elapsed 91.82 sec.INFO:  scanned index
> "flg_notif" to remove 74091 row versions
> DETAIL:  CPU 0.36s/0.37u sec elapsed 80.75 sec.INFO:  scanned index
> "flg_post" to remove 74091 row versions
> DETAIL:  CPU 0.31s/0.37u sec elapsed 115.86 sec.INFO:  scanned index
> "flg_no_inter" to remove 74091 row versions
> DETAIL:  CPU 0.34s/0.33u sec elapsed 68.96 sec.INFO:  "prj_frm_flg":
> removed 74091 row versions in 5979 pages
> DETAIL:  CPU 0.29s/0.34u sec elapsed 100.37 sec.INFO:  index
> "prj_frm_flg_pkey" now contains 1315895 row versions in 7716 pages
> DETAIL:  63153 index row versions were removed.
> 672 index pages have been deleted, 639 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_fav" now
> contains 1315895 row versions in 18228 pages
> DETAIL:  73628 index row versions were removed.
> 21 index pages have been deleted, 16 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_notif" now
> contains 1315895 row versions in 18179 pages
> DETAIL:  73468 index row versions were removed.
> 22 index pages have been deleted, 13 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_post" now
> contains 1315895 row versions in 18194 pages
> DETAIL:  73628 index row versions were removed.
> 30 index pages have been deleted, 23 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_no_inter" now
> contains 1315895 row versions in 8596 pages
> DETAIL:  73628 index row versions were removed.
> 13 index pages have been deleted, 8 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  "prj_frm_flg": found 74091
> removable, 1315895 nonremovable row versions in 10485 pages
> DETAIL:  326 dead row versions cannot be removed yet.
> There were 253639 unused item pointers.
> 10431 pages contain useful free space.
> 0 pages are entirely empty.
> CPU 1.91s/2.28u sec elapsed 542.75 sec.
>
> Total: 542877 ms.
>
> >
> > 3. Try your UPDATE statement at a low-traffic time of day and see
> > whether it's faster than it is at a high-traffic time of day, and by
> > how much.  Or dump your database and reload it on a dev server and
> > see how fast it runs there.
>
> It took 4ms.
>

Re: How would you store read/unread topic status?

От
Mike
Дата:
So your update doesn't take long to run during off-peak times, so
basically your options are:

1. Optimize your postgresql.conf settings or upgrade to the latest
version of PostgreSQL.

2. Redesign your forum code so it can scale better.

3. Upgrade your servers hardware as it may be overloaded.

I would probably attack those in the order I described.

As far as redesigning your forum code, keep in mind that in PostgreSQL
an update is basically a select, delete, insert in a single statement.
First it needs to find the rows to update, it marks the rows for
deletion (which vacuum later does) and inserts a new row. So updates
can be quite expensive.

In SOME situations, it can be faster to do inserts only, and modify
your select query to get just the data you need, for example:

Rather then an update like this:

update <table> set LastReadAnswerID = <value> where UserID = <value>
AND TopicID = <value>

You could do this instead:

insert into <table> VALUES(<user_id>,<topic_id>,<last_read_answer_id>)

Then just modify your select statement slightly to get the last
inserted row:

select * from <table> where user_id = <value> AND topic_id = <value>
order by LastReadAnswerID DESC LIMIT 1

This makes your select statement slightly more expensive but your
insert statement pretty much as cheap as possible. Since its much
easier to cache select results you could easily wrap some caching
mechanism around your select query to reduce the load there too.

Then using a task scheduler like cron simply clear out old rows from the
table you insert into every minute, 5 minutes, hour, day, whatever makes
most sense to keep the select queries fast.

A memcached solution would probably be much better, but its also likely
much more involved to do.



On Tue, 23 Jun 2009 17:50:50 +0200
Mathieu Nebra <mateo21@siteduzero.com> wrote:

> Robert Haas a écrit :
> >>>> Which pg version are you using?
> >> I should have mentionned that before sorry: PostgreSQL 8.2
> >
> > I think there is an awful lot of speculation on this thread about
> > what your problem is without anywhere near enough investigation.  A
> > couple of seconds for an update is a really long time, unless your
> > server is absolutely slammed, in which case probably everything is
> > taking a long time.  We need to get some more information on what
> > is happening here.
>
> You're right, I'll give you the information you need.
>
> >  Approximately how many requests per second are you servicing?
> > Also,
>
> How can I extract this information from the database? I know how to
> use pg_stat_user_tables. My table has:
>
> seq_tup_read
> 133793491714
>
> idx_scan
> 12408612540
>
> idx_tup_fetch
> 41041660903
>
> n_tup_ins
> 14700038
>
> n_tup_upd
> 6698236
>
> n_tup_del
> 15990670
>
> > can you:
> >
> > 1. Run EXPLAIN ANALYZE on a representative UPDATE statement and post
> > the exact query and the output.
>
> "Index Scan using prj_frm_flg_pkey on prj_frm_flg  (cost=0.00..8.58
> rows=1 width=18)"
> "  Index Cond: ((flg_mid = 3) AND (flg_sid = 123764))"
>
> This time it only took 54ms, but maybe it's already a lot.
>
>
> >
> > 2. Run VACUUM VERBOSE on your database and send the last 10 lines or
> > so of the output.
>
> It's not very long, I can give you the whole log:
>
> INFO:  vacuuming "public.prj_frm_flg"INFO:  scanned index
> "prj_frm_flg_pkey" to remove 74091 row versions
> DETAIL:  CPU 0.15s/0.47u sec elapsed 53.10 sec.INFO:  scanned index
> "flg_fav" to remove 74091 row versions
> DETAIL:  CPU 0.28s/0.31u sec elapsed 91.82 sec.INFO:  scanned index
> "flg_notif" to remove 74091 row versions
> DETAIL:  CPU 0.36s/0.37u sec elapsed 80.75 sec.INFO:  scanned index
> "flg_post" to remove 74091 row versions
> DETAIL:  CPU 0.31s/0.37u sec elapsed 115.86 sec.INFO:  scanned index
> "flg_no_inter" to remove 74091 row versions
> DETAIL:  CPU 0.34s/0.33u sec elapsed 68.96 sec.INFO:  "prj_frm_flg":
> removed 74091 row versions in 5979 pages
> DETAIL:  CPU 0.29s/0.34u sec elapsed 100.37 sec.INFO:  index
> "prj_frm_flg_pkey" now contains 1315895 row versions in 7716 pages
> DETAIL:  63153 index row versions were removed.
> 672 index pages have been deleted, 639 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_fav" now
> contains 1315895 row versions in 18228 pages
> DETAIL:  73628 index row versions were removed.
> 21 index pages have been deleted, 16 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_notif" now
> contains 1315895 row versions in 18179 pages
> DETAIL:  73468 index row versions were removed.
> 22 index pages have been deleted, 13 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_post" now
> contains 1315895 row versions in 18194 pages
> DETAIL:  73628 index row versions were removed.
> 30 index pages have been deleted, 23 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index "flg_no_inter" now
> contains 1315895 row versions in 8596 pages
> DETAIL:  73628 index row versions were removed.
> 13 index pages have been deleted, 8 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  "prj_frm_flg": found 74091
> removable, 1315895 nonremovable row versions in 10485 pages
> DETAIL:  326 dead row versions cannot be removed yet.
> There were 253639 unused item pointers.
> 10431 pages contain useful free space.
> 0 pages are entirely empty.
> CPU 1.91s/2.28u sec elapsed 542.75 sec.
>
> Total: 542877 ms.
>
> >
> > 3. Try your UPDATE statement at a low-traffic time of day and see
> > whether it's faster than it is at a high-traffic time of day, and by
> > how much.  Or dump your database and reload it on a dev server and
> > see how fast it runs there.
>
> It took 4ms.
>

Re: How would you store read/unread topic status?

От
Scott Carey
Дата:
You're holding this behavior to far too strict of a transactional guarantee.

The client software can cache a set of recent views, and sent updates in
bulk every 1 or 2 seconds.  Worst case, if your client crashes you lose a
second worth of user metadata updates on last accessed and view counts.
This isn't a financial transaction, don't build the app like one.

The same facility can serve as a read cache for other bits that don't need
to be 'perfect' in the transactional sense -- counts on the number of views
/ posts of a topic, etc.  Using the db to store and retrieve such counts
synchronously is frankly, a bad application design.


The tricky part with the above is two fold:  you need to have client
software capable of a thread-safe shared cache, and the clients will have to
have sticky-session if you are load balancing.  Corner cases such as a
server going down and a user switching servers will need to be worked out.


On 6/23/09 4:12 AM, "Mathieu Nebra" <mateo21@siteduzero.com> wrote:

> Hi all,
>
> I'm running a quite large website which has its own forums. They are
> currently heavily used and I'm getting performance issues. Most of them
> are due to repeated UPDATE queries on a "flags" table.
>
> This "flags" table has more or less the following fields:
>
> UserID - TopicID - LastReadAnswerID
>
> The flags table keeps track of every topic a member has visited and
> remembers the last answer which was posted at this moment. It allows the
> user to come back a few days after and immediately jump to the last
> answer he has not read.
>
> My problem is that everytime a user READS a topic, it UPDATES this flags
> table to remember he has read it. This leads to multiple updates at the
> same time on the same table, and an update can take a few seconds. This
> is not acceptable for my users.
>
> Question: what is the general rule of thumb here? How would you store
> this information?
>
> Thanks a lot in advance.
> Mathieu.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: How would you store read/unread topic status?

От
Scott Carey
Дата:
On 6/23/09 7:54 AM, "Mathieu Nebra" <mateo21@siteduzero.com> wrote:

>> On 06/23/2009 01:12 PM, Mathieu Nebra wrote:
>>>>> I'm running a quite large website which has its own forums. They are
>>>>> currently heavily used and I'm getting performance issues. Most of
> them
>>>>> are due to repeated UPDATE queries on a "flags" table.
>>>>>
>>>>> This "flags" table has more or less the following fields:
>>>>>
>>>>> UserID - TopicID - LastReadAnswerID
>>>>>
>>>>> The flags table keeps track of every topic a member has visited and
>>>>> remembers the last answer which was posted at this moment. It
> allows the
>>>>> user to come back a few days after and immediately jump to the last
>>>>> answer he has not read.
>>>>> My problem is that everytime a user READS a topic, it UPDATES this
> flags
>>>>> table to remember he has read it. This leads to multiple updates
> at the
>>>>> same time on the same table, and an update can take a few seconds.
> This
>>>>> is not acceptable for my users.
>>> Have you analyzed why it takes that long? Determining that is the first
>>> step of improving the current situation...
>>>
>>> My first guess would be, that your disks cannot keep up with the number
>>> of syncronous writes/second. Do you know how many transactions with
>>> write access you have? Guessing from your description you do at least
>>> one write for every page hit on your forum.
>
> I don't know how many writes/s Pgsql can handle on my server, but I
> first suspected that it was good practice to avoid unnecessary writes.
>
> I do 1 write/page for every connected user on the forums.
> I do the same on another part of my website to increment the number of
> page views (this was not part of my initial question but it is very close).
>
>>>
>>> With the default settings every transaction needs to wait for io at the
>>> end - to ensure transactional semantics.
>>> Depending on your disk the number of possible writes/second is quite low
>>> - a normal SATA disk with 7200rpm can satisfy something around 130
>>> syncronous writes per second. Which is the upper limit on writing
>>> transactions per second.
>>> What disks do you have?
>
> We have 2 SAS RAID 0 15000rpm disks.
>
>>>
>>> On which OS are you? If you are on linux you could use iostat to get
>>> some relevant statistics like:
>>> iostat -x /path/to/device/the/database/resides/on 2 10
>>>
>>> That gives you 10 statistics over periods of 2 seconds.
>>>
>>>
>>> Depending on those results there are numerous solutions to that
> problem...
>
> Here it is:
>
> $ iostat -x /dev/sda 2 10
> Linux 2.6.18-6-amd64 (scratchy)         23.06.2009
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           18,02    0,00   12,87   13,13    0,00   55,98
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0,94   328,98 29,62 103,06   736,58  6091,14    51,46
>     0,04    0,25   0,04   0,51
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           39,65    0,00   48,38    2,00    0,00    9,98
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0,00     0,00 10,00 78,00   516,00  1928,00    27,77
>    6,44   73,20   2,75  24,20
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           40,15    0,00   48,13    2,24    0,00    9,48
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0,00     0,00  6,47 100,50   585,07  2288,56    26,87
>    13,00  121,56   3,00  32,04
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           45,14    0,00   45,64    6,73    0,00    2,49
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               1,00     0,00 34,00 157,50  1232,00  3904,00    26,82
>    26,64  139,09   3,03  58,00
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           46,25    0,00   49,25    3,50    0,00    1,00
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0,00     0,00 27,00 173,00   884,00  4224,00    25,54
>    24,46  122,32   3,00  60,00
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           44,42    0,00   47,64    2,23    0,00    5,71
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0,00     0,00 15,42 140,30   700,50  3275,62    25,53
>    17,94  115,21   2,81  43,78
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           41,75    0,00   48,50    2,50    0,00    7,25
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0,50     0,00 21,11 116,08   888,44  2472,36    24,50
>    12,62   91,99   2,55  34,97
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           44,03    0,00   46,27    2,99    0,00    6,72
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               9,00     0,00 10,00 119,00   484,00  2728,00    24,90
>    15,15  117,47   2,70  34,80
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           36,91    0,00   51,37    2,49    0,00    9,23
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0,99     0,00 14,78 136,45   390,15  2825,62    21,26
>    21,86  144,52   2,58  39,01
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           38,75    0,00   48,75    1,00    0,00   11,50
>
> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0,00     0,00  7,54 67,34   377,89  1764,82    28,62
>    5,38   71,89   2,95  22,11
>


I see a lot of io wait time there.  My guess is that your DB is flooded with
synchronous writes.

IF you want to optimize the hardware for this you have a couple options.
I'm assuming your RAID 0 is not hardware RAID.

1.  Use 8.3+ and asynchronous commit  (set synchronous_commit=false).  This
is safe data wise, but if your DB crashes you might lose the last second of
transactions or so that the app thought were comitted.  For a DB forum, this
is probably very acceptable.   Performance should significantly gain as the
writes/sec will go down a lot.

2. put your data on one partition and your WAL log on another.

3. Get a battery backed hardware raid with write-back caching.

4. If you are using ext3 on linux, make sure you mount with data=writeback
on the file system that your wal logs are on. data=ordered will cause the
WHOLE file sytem to be flushed for each fsync, not just the tiny bit of WAL
log.

In short, if you combined 1,2, and 4, you'll probably have significantly
more capacity on the same server.  So make sure your WAL log is in a
different file system from your OS and data, mount it optimally, and
consider turning synchronous_commit off.

If you're using RAID 0, I doubt the data is so precious that
synchronous_commit being true is important at all.


>
>
>>>
>>>>> Question: what is the general rule of thumb here? How would you store
>>>>> this information?
>>> The problem here is, that every read access writes to disk - that is not
>>> going to scale very well.
>
> That's what I thought.
>
>>> One possible solution is to use something like memcached to store the
>>> last read post in memory and periodically write it into the database.
>>>
>
> We're starting using memcached. But how would you "periodically" write
> that to database?
>
>>>
>>> Which pg version are you using?
>
> I should have mentionned that before sorry: PostgreSQL 8.2
>
> Thanks a lot!
>
>
>
> Andres Freund a écrit :


Re: How would you store read/unread topic status?

От
Greg Stark
Дата:
All the other comments are accurate, though it does seem like
something the database ought to be able to handle.

The other thing which hasn't been mentioned is that you have a lot of
indexes. Updates require maintaining all those indexes. Are all of
these indexes really necessary? Do you have routine queries which look
up users based on their flags? Or all all your oltp transactions for
specific userids in which case you probably just need the index on
userid.

You'll probably find 8.3 helps this workload more than any tuning you
can do in the database though. Especially if you can reduce the number
of indexes and avoid an index on any flags that are being updated.

Re: How would you store read/unread topic status?

От
Mathieu Nebra
Дата:
Greg Stark a écrit :
> All the other comments are accurate, though it does seem like
> something the database ought to be able to handle.
>
> The other thing which hasn't been mentioned is that you have a lot of
> indexes. Updates require maintaining all those indexes. Are all of
> these indexes really necessary? Do you have routine queries which look
> up users based on their flags? Or all all your oltp transactions for
> specific userids in which case you probably just need the index on
> userid.


We are using these indexes, but I can't be sure if we _really_ need them
or not.

I can go into detail. We have:

UserID - TopicID - LastReadAnswerID - WrittenStatus - IsFavorite

So basically, we toggle the boolean flag WrittenStatus when the user has
written in that topic. The same goes for IsFavorite.

We have indexes on them, so we can SELECT every topic WHERE the user has
written. Is it the good way of doing this?


Oh, I've made a mistake before, we have RAID 1 disks, not RAID 0.


>
> You'll probably find 8.3 helps this workload more than any tuning you
> can do in the database though. Especially if you can reduce the number
> of indexes and avoid an index on any flags that are being updated.

I'll start this way, thanks. First 8.3, then I'll check my flags.

I have a lot of ways to investigate and I would like to thank every
contributor here. I might come again with more precise information.

Thanks.

Re: How would you store read/unread topic status?

От
Alexander Staubo
Дата:
On Tue, Jun 23, 2009 at 1:12 PM, Mathieu Nebra<mateo21@siteduzero.com> wrote:
> The flags table keeps track of every topic a member has visited and
> remembers the last answer which was posted at this moment. It allows the
> user to come back a few days after and immediately jump to the last
> answer he has not read.

I forgot to mention that we speed up our queries by caching the "last
read" ID in Memcached. This is the kind of thing that Memcached is
ideal for.

For example, we show the list of the most recent posts, along with a
comment count, eg. "42 comments (6 new)". We found that joining posts
against the last-read table is expensive, so instead we read from
Memcached on every post to find the number of unread comments.

We use the thread's "last commented at" timestamp as part of the key
so that when somebody posts a new comment, every user's cached unread
count is invalidated; it is automatically recalculated the next time
they view the post.

A.

Re: How would you store read/unread topic status?

От
Craig James
Дата:
Mathieu Nebra wrote:
> Greg Stark a écrit :
>> All the other comments are accurate, though it does seem like
>> something the database ought to be able to handle.
>>
>> The other thing which hasn't been mentioned is that you have a lot of
>> indexes. Updates require maintaining all those indexes. Are all of
>> these indexes really necessary? Do you have routine queries which look
>> up users based on their flags? Or all all your oltp transactions for
>> specific userids in which case you probably just need the index on
>> userid.
>
>
> We are using these indexes, but I can't be sure if we _really_ need them
> or not.
>
> I can go into detail. We have:
>
> UserID - TopicID - LastReadAnswerID - WrittenStatus - IsFavorite
>
> So basically, we toggle the boolean flag WrittenStatus when the user has
> written in that topic. The same goes for IsFavorite.

Do those last two columns hold much data?  Another thing to consider is to split this into two tables:

  UserID - TopicID - LastReadAnswerID

  UserID - TopicID - WrittenStatus - IsFavorite

As others have pointed out, an UPDATE in Postgres is a select/delete/insert, and if you're updating just the
LastReadAnswerIDall the time, you're wasting time deleting and re-inserting a lot of data that never change (assuming
they'renot trivially small columns). 

This might also solve the problem of too many indexes -- the table that's updated frequently would only have an index
on(UserID, TopicID), so the update only affects one index. 

Then to minimize the impact on your app, create a view that looks like the original table for read-only apps.

Craig

Re: How would you store read/unread topic status?

От
Greg Stark
Дата:
On Tue, Jun 23, 2009 at 9:04 PM, Mathieu Nebra<mateo21@siteduzero.com> wrote:
> We have indexes on them, so we can SELECT every topic WHERE the user has
> written. Is it the good way of doing this?

I'm kind of skeptical that a simple index on userid,topic isn't
sufficient to handle this case. But you would have to test it on
actual data to be sure. It depends whether you have enough topics and
enough userid,topic records for a given userid that scanning all the
topics for a given user is actually too slow.

Even if it's necessary you might consider having a "partial" index on
user,topic WHERE writtenstatus instead of having a three-column index.

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: How would you store read/unread topic status?

От
Chris St Denis
Дата:
Mathieu Nebra wrote:
Alexander Staubo a écrit : 
On Tue, Jun 23, 2009 at 1:12 PM, Mathieu Nebra<mateo21@siteduzero.com> wrote:   
This "flags" table has more or less the following fields:

UserID - TopicID - LastReadAnswerID     
We are doing pretty much same thing.
   
My problem is that everytime a user READS a topic, it UPDATES this flags
table to remember he has read it. This leads to multiple updates at the
same time on the same table, and an update can take a few seconds. This
is not acceptable for my users.     
First of all, and I'm sure you thought of this, an update isn't needed
every time a user reads a topic; only when there are new answers that
need to be marked as read. So an "update ... where last_read_answer_id
< ?" should avoid the need for an update.   
We don't work that way. We just "remember" he has read these answers and
then we can tell him "there are no new messages for you to read".
So we just need to write what he has read when he reads it.
 
(That said, I believe PostgreSQL diffs tuple updates, so in practice
PostgreSQL might not be writing anything if you run an "update" with
the same value. I will let someone more intimate with the internal
details of updates to comment on this.)

Secondly, an update should not take "a few seconds". You might want to
investigate this part before you turn to further optimizations.   
Yes, I know there is a problem but I don't know if I am competent enough
to tune PostgreSQL for that. It can take a while to understand the
problem, and I'm not sure I'll have the time for that.

I am, however, opened to suggestions. Maybe I'm doing something wrong
somewhere.
 
In our application we defer the updates to a separate asynchronous
process using a simple queue mechanism, but in our case, we found that
the updates are fast enough (in the order of a few milliseconds) not
to warrant batching them into single transactions.   
A few milliseconds would be cool.
In fact, defering to another process is a good idea, but I'm not sure if
it is easy to implement. It would be great to have some sort of UPDATE
... LOW PRIORITY to make the request non blocking.

Thanks.
 
I use pg_send_query() <http://ca2.php.net/manual/en/function.pg-send-query.php> in php to achieve this for a views counter. "Script execution is not blocked while the queries are executing."

It looks like this may just be a direct translation of PQsendQuery() from libpq. Your preferred language may have a function like this.

Re: How would you store read/unread topic status?

От
Mathieu Nebra
Дата:
Craig James a écrit :
> Mathieu Nebra wrote:
>> Greg Stark a écrit :
>>> All the other comments are accurate, though it does seem like
>>> something the database ought to be able to handle.
>>>
>>> The other thing which hasn't been mentioned is that you have a lot of
>>> indexes. Updates require maintaining all those indexes. Are all of
>>> these indexes really necessary? Do you have routine queries which look
>>> up users based on their flags? Or all all your oltp transactions for
>>> specific userids in which case you probably just need the index on
>>> userid.
>>
>>
>> We are using these indexes, but I can't be sure if we _really_ need them
>> or not.
>>
>> I can go into detail. We have:
>>
>> UserID - TopicID - LastReadAnswerID - WrittenStatus - IsFavorite
>>
>> So basically, we toggle the boolean flag WrittenStatus when the user has
>> written in that topic. The same goes for IsFavorite.
>
> Do those last two columns hold much data?  Another thing to consider is
> to split this into two tables:

The last two columns only store TRUE or FALSE, they're booleans. So
you're saying that an index on them might be useless ? We're retrieving
1000-2000 rows max and we need to extract only those who have TRUE on
the last column for example.

>
>  UserID - TopicID - LastReadAnswerID
>  UserID - TopicID - WrittenStatus - IsFavorite
>
> As others have pointed out, an UPDATE in Postgres is a
> select/delete/insert, and if you're updating just the LastReadAnswerID
> all the time, you're wasting time deleting and re-inserting a lot of
> data that never change (assuming they're not trivially small columns).

They are trivially small columns.

>
> This might also solve the problem of too many indexes -- the table
> that's updated frequently would only have an index on (UserID, TopicID),
> so the update only affects one index.

I'll investigate that way.

>
> Then to minimize the impact on your app, create a view that looks like
> the original table for read-only apps.

Good idea, thanks again.

Re: How would you store read/unread topic status?

От
Mathieu Nebra
Дата:
>>
>>> In our application we defer the updates to a separate asynchronous
>>> process using a simple queue mechanism, but in our case, we found that
>>> the updates are fast enough (in the order of a few milliseconds) not
>>> to warrant batching them into single transactions.
>>>
>>
>> A few milliseconds would be cool.
>> In fact, defering to another process is a good idea, but I'm not sure if
>> it is easy to implement. It would be great to have some sort of UPDATE
>> ... LOW PRIORITY to make the request non blocking.
>>
>> Thanks.
>>
>>
> I use pg_send_query()
> <http://ca2.php.net/manual/en/function.pg-send-query.php> in php to
> achieve this for a views counter. "Script execution is not blocked while
> the queries are executing."
>
> It looks like this may just be a direct translation of PQsendQuery()
> from libpq. Your preferred language may have a function like this.
>

I am using PHP. That was one of the thing I was looking for, thank you! :)
We'll combine this with a memcached solution so we just update every
1000 views for example.