Re: PostgreSQL design question
От | Jeffrey Tenny |
---|---|
Тема | Re: PostgreSQL design question |
Дата | |
Msg-id | 3FF58B27.5060006@comcast.net обсуждение исходный текст |
Ответ на | Re: PostgreSQL design question (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-jdbc |
Tom Lane wrote:
I start the transaction and query n_row for append-only tables. If I have fewer rows in my cache,
I know I need to read the rows inserted after n_rows (basically those records whose primary key is > n_rows).
I do this based on primary key and not update time, since I don't
want to store update times in the append-only table rows in order to know which were added 'since' a given time.
If someone updates the table that's all well and good, I'm in SERIALIZABLE state anyway,
so nothing about my view of the database will see any updates since I queried the update table.
Or am I still missing the point?
And I can logically max() n_rows (unless I see the light as to why that's worthless, I'm still missing that).
and insert a new row, every time I update the update-table? Or is that going to fragment the database somehow?
then updated at the end of the transaction if necessary.
Since I'm using SERIALIZABLE transaction isolation level, I should see B's commit, but not A's.
It sounds like you're considering the READ COMMITTED state of things, which is very interesting
because I'd like to be there for performance reasons, but right now I'm using SERIALIZABLE.
the update table, which I believe (falsely?) is guaranteed by SERIALIZABLE transaction semantics,
A's updates don't matter. My database view, and my memory cache view for C will all reflect a consistent data set.
(That's trickier than it sounds, since I share data in my memory cache across threads, and had to implement my
own memory cache transaction system to provide the right semantics).
I'm updating the row with CURRENT_TIMESTAMP(6) in all cases.
But perhaps you've illustrated a bug even with my assumptions about SERIALIZABLE transactions.
C could set the timestamp of the row to a time that postdates A's even though C commits first,
depending on each transactions delays between update and commit time.
But if this happens with serializable semantics, A would be rolled back because of a single row update conflict.
Using the idea of multiple rows per table, then A would have a row with an earlier timestamp than C's even though
it commits first. I think that's ok. Any transaction that postdates one or both of A and C
will be looking for the max update time.
I think the truly glaring problem is that if ever move to a READ COMMITTED isolation level,
what the update table says and what subsequent queries on the tables say will be potentially two different things.
I can handle that with my append-only tables, but for update/delete enabled tables I'd have to essentially lock the tables or rows
in suitable lock hierarchy order.
in Singapore and the database server is in Kansas, the cache becomes very important.
The Singapore folks would be pissed if their first application transaction of the morning took 30 minutes to execute
while loading lots of data from Kansas.
Since I'm not currently supporting any distributed database deployments the cache is important.
Still, a cache invalidation doesn't mean I need to re-read everything, especially for those append-only tables.
I haven't even looked at the database replication or 2 phase commit capabilities for PostgreSQL. I'm trying to write
code that is fairly least-common-denominator with respect to database capability. There are some requirements
of the databases I use though. MVCC is crucial to app performance given my use of complex serializable transactions,
MySql would just be stalled all the time since it doesn't have MVCC.
primary keys and strongly prefers them to be adjacent integer values (the latter more for performance than absolute necessity).
performance problems. Where I really lost sleep and decided I had to go with my own sequence allocator
was the nightmare scenario of my app end-user doing some thing really stupid in a script that caused
an endless loop of failed transactions beyond my control. Then the sequence generator gaps would be prohobitive.
Depending on the damage done to the database, that could require one of those nasty trips customer site to
repair the database, not to mention server downtime (anathema to my customers).
Better slow and steady than fast and unreliable, but there are limits to how slow...
(Fast, cheap, good, pick two. Sigh).
Thanks,
Dave
Since many of my tables are append-only, the logic is like this:Jeffrey Tenny <jeffrey.tenny@comcast.net> writes:To determine the cache coherency of my java memory cache of database table content, I maintain an "update table" for all tables to be cached. Whenever a transaction updates a table that is cached, it must also update the 'update table'.Hm. I do not actually see what it buys you to store n_rows in the update table; seems like the update time is the only thing that is useful. (With concurrent inserts under MVCC, n_rows is in the eye of the beholder anyway, so if your caching code *is* relying on it then it is wrong.)
I start the transaction and query n_row for append-only tables. If I have fewer rows in my cache,
I know I need to read the rows inserted after n_rows (basically those records whose primary key is > n_rows).
I do this based on primary key and not update time, since I don't
want to store update times in the append-only table rows in order to know which were added 'since' a given time.
If someone updates the table that's all well and good, I'm in SERIALIZABLE state anyway,
so nothing about my view of the database will see any updates since I queried the update table.
Or am I still missing the point?
I wasn't trying to get rid of n_rows, but yes, the table represents a bottleneck.However, getting rid of n_rows doesn't eliminate the fundamental problem that rows of the update table are serialization bottlenecks.
Recognizing that update_time is all you can usefully propagate anyway, consider this design: * Get rid of primary key --- there are allowed to be multiple rows in the update table for any given data table. * Updating transactions insert (not update) a row identifying the table they changed and giving their update_time.
Yup, been there, done that.* Caching transactions look at the max update_time for each table they are interested in. This is cheap to do efficiently with the right query and index --- see past discussions about optimizing max() operations. (Hint: you do *not* write max().)
And I can logically max() n_rows (unless I see the light as to why that's worthless, I'm still missing that).
Couldn't I take it a step further and simply delete all existing rows for the table (as the transaction sees it)* Periodically, a maintenance process deletes all but the newest row for each table, and then does a VACUUM of the update table. This avoids serializing on any individual row of the update table. The maintenance process doesn't block anything either (it's okay if it happens to leave more than one valid row for a particular data table on any given run).
and insert a new row, every time I update the update-table? Or is that going to fragment the database somehow?
Unless C is two transactions, the update table is inspected only once at the start of each transaction,However, this idea has a serious problem: it does not quite work because transactions may not commit in the same order they start. Consider this scenario: Transaction A starts A inserts some data Transaction B starts B inserts some data B inserts its update_time B commits C inspects max(update_time) C updates its cache A inserts its update_time A commits C inspects max(update_time) C updates its cache??
then updated at the end of the transaction if necessary.
Since I'm using SERIALIZABLE transaction isolation level, I should see B's commit, but not A's.
It sounds like you're considering the READ COMMITTED state of things, which is very interesting
because I'd like to be there for performance reasons, but right now I'm using SERIALIZABLE.
As long as C has a transaction coherent snapshot of all tables as they existed when C first inspectedIf update_time is the transaction's start time (now()), then C will in fact fail to update its cache for A's updates, because it will still see max(update_time) as being B's value. A's update_time is smaller than B's.
the update table, which I believe (falsely?) is guaranteed by SERIALIZABLE transaction semantics,
A's updates don't matter. My database view, and my memory cache view for C will all reflect a consistent data set.
(That's trickier than it sounds, since I share data in my memory cache across threads, and had to implement my
own memory cache transaction system to provide the right semantics).
I'm updating the row with CURRENT_TIMESTAMP(6) in all cases.
But perhaps you've illustrated a bug even with my assumptions about SERIALIZABLE transactions.
C could set the timestamp of the row to a time that postdates A's even though C commits first,
depending on each transactions delays between update and commit time.
But if this happens with serializable semantics, A would be rolled back because of a single row update conflict.
Using the idea of multiple rows per table, then A would have a row with an earlier timestamp than C's even though
it commits first. I think that's ok. Any transaction that postdates one or both of A and C
will be looking for the max update time.
Let me know if I've missed your point entirely or if I've explained why it isn't a problem.We can narrow the window for this failure by using current_timestamp (wall clock time) taken as close as possible before the transaction commits --- but I don't think we can eliminate the window entirely.
I think the truly glaring problem is that if ever move to a READ COMMITTED isolation level,
what the update table says and what subsequent queries on the tables say will be potentially two different things.
I can handle that with my append-only tables, but for update/delete enabled tables I'd have to essentially lock the tables or rows
in suitable lock hierarchy order.
Depends on the deployment topology. If the app servers have cached a gigabyte of dataBTW, I think your original design is vulnerable to this same problem. You may or may not be saved from it by the fact that you are serializing on update-table rows; it would depend on the details of your coding. I thought about using a sequence generator instead of now() to generate the update_time values, but that doesn't seem to help. Does anyone see a way to make this 100% reliable?I've avoided using any notifications from the server (I'm not sure JDBC even supports it, haven't tried), since I'm unsure such mechanisms are reliable.LISTEN/NOTIFY would be more reliable than this, mainly because it ensures that notifications are not delivered until after the sending transaction commits (and therefore you cannot miss seeing its updates). However, I'm not sure JDBC supports it either :-(. Anyone know?Furthermore, app servers in some hostile customer network environments have idle network connections dropped after some number of hours. If that happens, the app server will miss messages and have to potentially completely reinitialize its cache.[ shrug... ] Use dummy keepalive transactions to prevent the connection from looking idle for long periods. In any case, I can't believe that infrequent cache reloads are a fatal objection.
in Singapore and the database server is in Kansas, the cache becomes very important.
The Singapore folks would be pissed if their first application transaction of the morning took 30 minutes to execute
while loading lots of data from Kansas.
Since I'm not currently supporting any distributed database deployments the cache is important.
Still, a cache invalidation doesn't mean I need to re-read everything, especially for those append-only tables.
I haven't even looked at the database replication or 2 phase commit capabilities for PostgreSQL. I'm trying to write
code that is fairly least-common-denominator with respect to database capability. There are some requirements
of the databases I use though. MVCC is crucial to app performance given my use of complex serializable transactions,
MySql would just be stalled all the time since it doesn't have MVCC.
Most of my ID use would be fine to have temporary gaps. I have only one entity that absolutely has to have monotonically increasingI'm also curious if anybody has developed a sequence generator for PostgreSQL that doesn't permit gaps in the sequence based on failed transactions.It's easy to see that any such thing *must* be a serialization bottleneck: you can't assign the next sequence ID till you see if the guy who took the previous one commits. Otherwise you have gaps. You can imagine mechanisms that will reissue the IDs taken by failed transactions to the next requestors. However, this just means that you have transient instead of permanent gaps in the ID sequence (since later IDs might be used by xacts that commit before those that successfully use reissued IDs). Most of the scenarios I can think of for wanting non-gapping ID sequences will fail just as miserably with transient gaps as with permanent ones.
primary keys and strongly prefers them to be adjacent integer values (the latter more for performance than absolute necessity).
Rewording the my above reply, most of my non-gap requirements are only in the long term to avoidIn fact, if your criterion for "no gaps" is "no one who looks at the current set of committed values ever sees a gap", then you really don't have any choice: you have to ensure that the xact that took ID n commits before the one that took ID n+1 commits; it's not enough they commit, the commits have to occur in the right order. So you'd have to have serialization at commit time even if you avoided serializing at the ID-issuance point. I would counsel taking a very hard look at why you think you need non-gapping IDs, rather than engaging in wishful thinking about finding an efficient way to have them.
performance problems. Where I really lost sleep and decided I had to go with my own sequence allocator
was the nightmare scenario of my app end-user doing some thing really stupid in a script that caused
an endless loop of failed transactions beyond my control. Then the sequence generator gaps would be prohobitive.
Depending on the damage done to the database, that could require one of those nasty trips customer site to
repair the database, not to mention server downtime (anathema to my customers).
Better slow and steady than fast and unreliable, but there are limits to how slow...
(Fast, cheap, good, pick two. Sigh).
Thanks,
Dave
В списке pgsql-jdbc по дате отправления: