Обсуждение: [GENERAL] Serializable isolation -- are predicate locks still held across alldatabases?

Поиск
Список
Период
Сортировка

[GENERAL] Serializable isolation -- are predicate locks still held across alldatabases?

От
"Karl O. Pinc"
Дата:
Hi,

I forget all the details, but some time ago I found
that I had to increase max_pred_locs_per_transaction.
What I recall about the reason for this is that I'm
using the serializable transaction isolation, and that
I've a test database which occasionally has extremely
long running transactions.  The PG serializable
snapshot isolation implementation at the time (9.1?)
was holding predicate locks across all databases
during transactions.  This even though databases
are independent of each other.  The long transaction
times in the test database lead to predicate lock
exhaustion in production databases -- only a single
transaction would be executing in the test database
but many would occur in the production databases.
(I don't know if there was potential for other bad effects
due to the production transactions "hanging around" until the
transaction in the test db finished.)

My question is whether this has changed.  Does PG
now pay attention to database in it's SSI implementation?

Thanks for the help and apologies if I'm not framing
the question perfectly.  It's not often I think about
this.

Regards,

Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein


Re: [GENERAL] Serializable isolation -- are predicate locks stillheld across all databases?

От
Kevin Grittner
Дата:
On Thu, May 18, 2017 at 11:07 AM, Karl O. Pinc <kop@meme.com> wrote:

> I forget all the details, but some time ago I found
> that I had to increase max_pred_locs_per_transaction.
> What I recall about the reason for this is that I'm
> using the serializable transaction isolation, and that
> I've a test database which occasionally has extremely
> long running transactions.  The PG serializable
> snapshot isolation implementation at the time (9.1?)
> was holding predicate locks across all databases
> during transactions.  This even though databases
> are independent of each other.  The long transaction
> times in the test database lead to predicate lock
> exhaustion in production databases -- only a single
> transaction would be executing in the test database
> but many would occur in the production databases.
> (I don't know if there was potential for other bad effects
> due to the production transactions "hanging around" until the
> transaction in the test db finished.)
>
> My question is whether this has changed.  Does PG
> now pay attention to database in it's SSI implementation?

Well, it pays attention as far as the scope of each lock, but there
is only one variable to track how far back the oldest transaction ID
for a running serializable transaction goes, which is used in
cleanup of old locks.  I see your point, and it might be feasible to
change that to a list or map that tracks it by database; but I don't
even have a gut feel estimate for the scale of such work without
investigating it.  Just out of curiosity, what is the reason you
don't move the production and test databases to separate instances?
If nothing else, extremely long-running transaction in one database
can lead to bloat in others.

> Thanks for the help and apologies if I'm not framing
> the question perfectly.  It's not often I think about
> this.

No sweat -- your concern/question is perfectly clear.  It's the
first time I've heard of someone with this particular issue, so at
this point I'm inclined to recommend the workaround of using a
separate cluster; but if we get other reports it might be worth
adding to the list of enhancements that SSI could use.

Thanks!

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/


Re: [GENERAL] Serializable isolation -- are predicate locks stillheld across all databases?

От
"Karl O. Pinc"
Дата:
On Thu, 18 May 2017 12:04:42 -0500
Kevin Grittner <kgrittn@gmail.com> wrote:

> On Thu, May 18, 2017 at 11:07 AM, Karl O. Pinc <kop@meme.com> wrote:
>
> > ...  Does PG
> > now pay attention to database in it's SSI implementation?
>
> Well, it pays attention as far as the scope of each lock, but there
> is only one variable to track how far back the oldest transaction ID
> for a running serializable transaction goes, which is used in
> cleanup of old locks.  I see your point, and it might be feasible to
> change that to a list or map that tracks it by database; but I don't
> even have a gut feel estimate for the scale of such work without
> investigating it.  Just out of curiosity, what is the reason you
> don't move the production and test databases to separate instances?
> If nothing else, extremely long-running transaction in one database
> can lead to bloat in others.

Ultimately it was easier to change the transaction isolation level
to repeatable read (or lower) for the transactions known to take
a long time.  Any concurrency issues (which have never arisen)
are handled at the human level.

> > Thanks for the help and apologies if I'm not framing
> > the question perfectly.  It's not often I think about
> > this.
>
> No sweat -- your concern/question is perfectly clear.  It's the
> first time I've heard of someone with this particular issue, so at
> this point I'm inclined to recommend the workaround of using a
> separate cluster; but if we get other reports it might be worth
> adding to the list of enhancements that SSI could use.

Understood.

To give you an idea of the use-case, we're using Chado
(http://gmod.org/wiki/Chado) a PG database design
which stores genetic information.  The datasets being what
they are, they are big and take a long time to load.
This is especially true because the Chado designers
are enamored of ontologies and knowledge representation
and so there's a lot of tables where, instead of
having separate columns for different types of data
there's simply 2 columns "type" and "data".  The
type is an oncology entry and tells you want the
data is.  This makes for ugly queries in the process
of loading data (and ugly SQL in general).

So loading genetic data sets is slow.  Not really an issue
as there's no anticipation of loading a data set more
than every 6 months or a year.  (Although non-genetic
data is loaded frequently.)

The workflow is to load data first into the
test db, possibly multiple times until satisfied.
Then load the data into production.   It is very handy,
especially in production, to load all related data
in a single transaction in the event something
goes wrong.

There are many non-optimal elements, not the least
of which is that it's not clear how much utility
there is in storing genetic datasets
in a relational db along side our non-genetic data.
(We are finding out.)

Thanks for the help.

Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein


Re: [GENERAL] Serializable isolation -- are predicate locks stillheld across all databases?

От
"Karl O. Pinc"
Дата:
On Fri, 19 May 2017 01:52:00 -0500
"Karl O. Pinc" <kop@meme.com> wrote:

> On Thu, 18 May 2017 12:04:42 -0500
> Kevin Grittner <kgrittn@gmail.com> wrote:
>
> > On Thu, May 18, 2017 at 11:07 AM, Karl O. Pinc <kop@meme.com> wrote:
> >
> > > ...  Does PG
> > > now pay attention to database in it's SSI implementation?
> >
> > Well, it pays attention as far as the scope of each lock, but there
> > is only one variable to track how far back the oldest transaction ID
> > for a running serializable transaction goes, which is used in
> > cleanup of old locks.

> > ...  It's the
> > first time I've heard of someone with this particular issue, so at
> > this point I'm inclined to recommend the workaround of using a
> > separate cluster

I think if I was to make an argument for doing something it would
be based on reliability -- how many users can you give their
own database before somebody leaves an open transaction hanging?

Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein


Re: [GENERAL] Serializable isolation -- are predicate locks stillheld across all databases?

От
Kevin Grittner
Дата:
On Fri, May 19, 2017 at 6:56 AM, Karl O. Pinc <kop@meme.com> wrote:

> I think if I was to make an argument for doing something it would
> be based on reliability -- how many users can you give their
> own database before somebody leaves an open transaction hanging?

Yeah, I guess it's worth having on the list, where it will compete
with other possible enhancements on a cost/benefit basis.  Thanks
for raising the issue!

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/