Re: PostgreSQL design question

Поиск
Список
Период
Сортировка
От Jeffrey Tenny
Тема Re: PostgreSQL design question
Дата
Msg-id 3FF58B27.5060006@comcast.net
обсуждение исходный текст
Ответ на Re: PostgreSQL design question  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-jdbc
Tom Lane wrote:
Jeffrey Tenny <jeffrey.tenny@comcast.net> writes: 
To determine the cache coherency of my java memory cache of database 
table content,
I maintain an "update table" for all tables to be cached.
Whenever a transaction updates a table that is cached, it must also 
update the 'update table'.   
Hm.  I do not actually see what it buys you to store n_rows in the
update table; seems like the update time is the only thing that is
useful.  (With concurrent inserts under MVCC, n_rows is in the eye
of the beholder anyway, so if your caching code *is* relying on it
then it is wrong.) 
Since many of my tables are append-only, the logic is like this:

I start the transaction and query n_row for append-only tables.  If I have fewer rows in my cache,
I know I need to read the rows inserted after n_rows (basically those records whose primary key is > n_rows). 
I do this based on primary key and not update time, since I don't
want to store update times in the append-only table rows in order to know which were added 'since' a given time.

If someone updates the table that's all well and good, I'm in SERIALIZABLE state anyway,
so nothing about my view of the database will see any updates since I queried the update table.

Or am I still missing the point?
However, getting rid of n_rows doesn't eliminate the fundamental problem
that rows of the update table are serialization bottlenecks.
I wasn't trying to get rid of n_rows, but yes, the table represents a bottleneck.

Recognizing that update_time is all you can usefully propagate anyway,
consider this design:

* Get rid of primary key --- there are allowed to be multiple rows in
the update table for any given data table.

* Updating transactions insert (not update) a row identifying the table
they changed and giving their update_time.
* Caching transactions look at the max update_time for each table they
are interested in.  This is cheap to do efficiently with the right query
and index --- see past discussions about optimizing max() operations.
(Hint: you do *not* write max().)
Yup, been there, done that.
And I can logically max() n_rows (unless I see the light as to why that's worthless, I'm still missing that).

* Periodically, a maintenance process deletes all but the newest row for
each table, and then does a VACUUM of the update table.

This avoids serializing on any individual row of the update table.
The maintenance process doesn't block anything either (it's okay if
it happens to leave more than one valid row for a particular data
table on any given run).
Couldn't I take it a step further and simply delete all existing rows for the table (as the transaction sees it)
and insert a new row, every time I update the update-table?  Or is that going to fragment the database somehow?

However, this idea has a serious problem: it does not quite work because
transactions may not commit in the same order they start.  Consider this
scenario:

Transaction A starts

A inserts some data
		Transaction B starts
		B inserts some data
		B inserts its update_time
		B commits
						C inspects max(update_time)
						C updates its cache

A inserts its update_time

A commits
						C inspects max(update_time)
						C updates its cache?? 
Unless C is two transactions, the update table is inspected only once at the start of each transaction,
then updated at the end of the transaction if necessary.

Since I'm using SERIALIZABLE transaction isolation level, I should see B's commit, but not A's.
It sounds like you're considering the READ COMMITTED state of things, which is very interesting
because I'd like to be there for performance reasons, but right now I'm using SERIALIZABLE.
If update_time is the transaction's start time (now()), then C will in
fact fail to update its cache for A's updates, because it will still see
max(update_time) as being B's value.  A's update_time is smaller than B's. 
As long as C has a transaction coherent snapshot of all tables as they existed when C first inspected
the update table, which I believe (falsely?) is guaranteed by SERIALIZABLE transaction semantics,
A's updates don't matter.  My database view, and my memory cache view for C will all reflect a consistent data set.
(That's trickier than it sounds, since I share data in my memory cache across threads, and had to implement my
own memory cache transaction system to provide the right semantics).

I'm updating the row with  CURRENT_TIMESTAMP(6) in all cases.

But perhaps you've illustrated a bug even with my assumptions about SERIALIZABLE transactions.
C  could set the timestamp of the row to a time that postdates A's even though C commits first,
depending on each transactions delays between update and commit time.

But if this happens with serializable semantics, A would be rolled back because of a single row update conflict.

Using the idea of multiple rows per table, then A would have a row with an earlier timestamp than C's even though
it commits first.  I think that's ok.  Any transaction that postdates one or both of A and C
will be looking for the max update time.


We can narrow the window for this failure by using current_timestamp
(wall clock time) taken as close as possible before the transaction
commits --- but I don't think we can eliminate the window entirely. 
Let me know if I've missed your point entirely or if I've explained why it isn't a problem.

I think the truly glaring problem is that if ever move to a READ COMMITTED isolation level,
what the update table says and what subsequent queries on the tables say will be potentially two different things.
I can handle that with my append-only tables, but for update/delete enabled tables I'd have to essentially lock the tables or rows
in suitable lock hierarchy order.
BTW, I think your original design is vulnerable to this same problem.
You may or may not be saved from it by the fact that you are serializing
on update-table rows; it would depend on the details of your coding.

I thought about using a sequence generator instead of now() to generate
the update_time values, but that doesn't seem to help.  Does anyone
see a way to make this 100% reliable?
 
I've avoided using any notifications from the server (I'm not sure JDBC 
even supports it, haven't tried),
since I'm unsure such mechanisms are reliable.   
LISTEN/NOTIFY would be more reliable than this, mainly because it
ensures that notifications are not delivered until after the sending
transaction commits (and therefore you cannot miss seeing its updates).
However, I'm not sure JDBC supports it either :-(.  Anyone know?
 
Furthermore, app servers in some hostile customer network environments
have idle network connections dropped after some number of hours. If
that happens, the app server will miss messages and have to
potentially completely reinitialize its cache.   
[ shrug... ]  Use dummy keepalive transactions to prevent the connection
from looking idle for long periods.  In any case, I can't believe that
infrequent cache reloads are a fatal objection. 
Depends on the deployment topology.  If the app servers have cached a gigabyte of data
in Singapore and the database server is in Kansas, the cache becomes very important.
The Singapore folks would be pissed if their first application transaction of the morning took 30 minutes to execute
while loading lots of data from Kansas.

Since I'm not currently supporting any distributed database deployments the cache is important.
Still, a cache invalidation doesn't mean I need to re-read everything, especially for those append-only tables.

I haven't even looked at the database replication or 2 phase commit capabilities for PostgreSQL.  I'm trying to write
code that is fairly least-common-denominator with respect to database capability.  There are some requirements
of the databases I use though.  MVCC is crucial to app performance given my use of complex serializable transactions,
MySql would just be stalled all the time since it doesn't have MVCC.
 
I'm also curious if anybody has developed a sequence generator for 
PostgreSQL that doesn't permit
gaps in the sequence based on failed transactions.   
It's easy to see that any such thing *must* be a serialization
bottleneck: you can't assign the next sequence ID till you see if
the guy who took the previous one commits.  Otherwise you have gaps.

You can imagine mechanisms that will reissue the IDs taken by failed
transactions to the next requestors.  However, this just means that you
have transient instead of permanent gaps in the ID sequence (since
later IDs might be used by xacts that commit before those that
successfully use reissued IDs).  Most of the scenarios I can think of
for wanting non-gapping ID sequences will fail just as miserably with
transient gaps as with permanent ones.
Most of my ID use would be fine to have temporary gaps.  I have only one entity that absolutely has to have monotonically increasing
primary keys and strongly prefers them to be adjacent integer values (the latter more for performance than absolute necessity).

In fact, if your criterion for "no gaps" is "no one who looks at the
current set of committed values ever sees a gap", then you really don't
have any choice: you have to ensure that the xact that took ID n
commits before the one that took ID n+1 commits; it's not enough they
commit, the commits have to occur in the right order.  So you'd have to
have serialization at commit time even if you avoided serializing at the
ID-issuance point.

I would counsel taking a very hard look at why you think you need
non-gapping IDs, rather than engaging in wishful thinking about finding
an efficient way to have them.
Rewording the my above reply, most of my non-gap requirements are only in the long term to avoid
performance problems.  Where I really lost sleep and decided  I had to go with my own sequence allocator
was the nightmare scenario of my app end-user doing some thing really stupid in a script that caused
an endless loop of failed transactions beyond my control.  Then the sequence generator gaps would be prohobitive.
Depending on the damage done to the database, that could require one of those nasty trips customer site to
repair the database, not to mention server downtime (anathema to my customers).

Better slow and steady than fast and unreliable, but there are limits to how slow... 
(Fast, cheap, good, pick two.  Sigh).

Thanks,

Dave

В списке pgsql-jdbc по дате отправления:

Предыдущее
От: Jeffrey Tenny
Дата:
Сообщение: For Tom Lane
Следующее
От: Joe Conway
Дата:
Сообщение: Re: [HACKERS] PL/Java issues