Обсуждение: MVCC and Implications for (Near) Real-Time Application

Поиск
Список
Период
Сортировка

MVCC and Implications for (Near) Real-Time Application

От
Steve Wong
Дата:
Hi experts,

I have a (near) real-time application in which inserts into the database needs to be visible to queries from other threads with minimal delay. The inserts are triggered by real-time events and are therefore asynchronous (i.e. many performance tips I read related to batch inserts or copy do not apply here, since these events cannot be predicted or batched), and the inserted data need to be available within a couple of seconds to other threads (for example, an inserted row that only appears to other query threads 5 seconds or more after the insert is not acceptable). The delay should be under 2 seconds maximum, sub-1 second would be great.

My questions are: (1) Does the MVCC architecture introduce significant delays between insert by a thread and visibility by other threads (I am unclear about how multiple versions are "collapsed" or reconciled, as well as how different query threads are seeing which version)? (2) Are there any available benchmarks that can measure this delay? (3) What are relevant config parameters that will reduce this delay?

Thanks for your patience with my ignorance of MVCC (still learning more about it),
Steve

Re: MVCC and Implications for (Near) Real-Time Application

От
"A.M."
Дата:
On Oct 25, 2010, at 2:46 PM, Steve Wong wrote:

> Hi experts,
>
> I have a (near) real-time application in which inserts into the database needs
> to be visible to queries from other threads with minimal delay. The inserts are
> triggered by real-time events and are therefore asynchronous (i.e. many
> performance tips I read related to batch inserts or copy do not apply here,
> since these events cannot be predicted or batched), and the inserted data need
> to be available within a couple of seconds to other threads (for example, an
> inserted row that only appears to other query threads 5 seconds or more after
> the insert is not acceptable). The delay should be under 2 seconds maximum,
> sub-1 second would be great.
>
> My questions are: (1) Does the MVCC architecture introduce significant delays
> between insert by a thread and visibility by other threads (I am unclear about
> how multiple versions are "collapsed" or reconciled, as well as how different
> query threads are seeing which version)? (2) Are there any available benchmarks
> that can measure this delay? (3) What are relevant config parameters that will
> reduce this delay?

There is no way to know without testing whether your hardware, OS, database schema, and database load can meet your
demands.However, there is no technical reason why PostgreSQL could not meet your timing goals- MVCC does not inherently
introducedelays, however the PostgreSQL implementation requires a cleanup process which can introduce latency. 

If you find that your current architecture is not up to the task, consider using LISTEN/NOTIFY with a payload (new in
9.0),which we are using for a similar "live-update" system. 

Cheers,
M



Re: MVCC and Implications for (Near) Real-Time Application

От
"Kevin Grittner"
Дата:
Steve Wong <powerpchead@yahoo.com> wrote:

> (1) Does the MVCC architecture introduce significant delays
> between insert by a thread and visibility by other threads (I am
> unclear about how multiple versions are "collapsed" or reconciled,
> as well as how different query threads are seeing which version)?

As soon as the inserting transaction commits the inserted row is
visible to new snapshots.  If you are in an explicit transaction the
commit will have occurred before the return from the COMMIT request;
otherwise it will have completed before the return from the INSERT
request.

You will get a new snapshot for every statement in READ COMMITTED
(or lower) transaction isolation.  You will get a new snapshot for
each database transaction in higher isolation levels.

-Kevin

Re: MVCC and Implications for (Near) Real-Time Application

От
"Pierre C"
Дата:
> My questions are: (1) Does the MVCC architecture introduce significant
> delays between insert by a thread and visibility by other threads

As said by others, once commited it is immediately visible to all

> (2) Are there any available benchmarks that can measure this delay?

Since you will not be batching INSERTs, you will use 1 INSERT per
transaction.
If you use Autocommit mode, that's it.
If you don't, you will get a few extra network roundtrips after the
INSERT, to send the COMMIT.

One INSERT is usually extremely fast unless you're short on RAM and the
indexes that need updating need some disk seeking.

Anyway, doing lots of INSERTs each in its own transaction is usually very
low-throughput, because at each COMMIT, postgres must always be sure that
all the data is actually written to the harddisks. So, depending on the
speed of your harddisks, each COMMIT can take up to 10-20 milliseconds.

On a 7200rpm harddisk, it is absolutely impossible to do more than 7200
commits/minute if you want to be sure each time that the data really is
written on the harddisk, unless :

- you use several threads (one disk write can group several commits from
different connections, see the config file docs)
- you turn of synchronous_commit ; in this case commit is instantaneous,
but if your server loses power or crashes, the last few seconds of data
may be lost (database integrity is still guaranteed though)
- you use a battery backup cache on your RAID controller, in this case
"written to the harddisks" is replaced by "written to batteyr backed RAM"
which is a lot faster

If you dont use battery backed cache, place the xlog on a different RAID1
array than the tables/indexes, this allows committing of xlog records
(which is the time critical part) to proceed smoothly and not be disturbed
by other IO on the indexes/tables. Also consider tuning your bgwriter and
checkpoints, after experimentation under realistic load conditions.

So, when you benchmark your application, if you get disappointing results,
think about this...