Обсуждение: One long transaction or multiple short transactions?

Поиск
Список
Период
Сортировка

One long transaction or multiple short transactions?

От
"Carlo"
Дата:

We have a system which is constantly importing flat file data feeds into normalized tables in a DB warehouse over 10-20 connections. Each data feed row results in a single transaction of multiple single row writes to multiple normalized tables.

 

The more columns in the feed row, the more write operations, longer the transaction.

 

Operators are noticing that splitting a single feed of say – 100 columns – into two consecutive feeds of 50 columns improves performance dramatically. I am wondering whether the multi-threaded and very busy import environment causes non-linear performance degradation for longer transactions. Would the operators be advised to rewrite the feeds to result in more smaller transactions rather than fewer, longer ones?

 

Carlo

Re: One long transaction or multiple short transactions?

От
Igor Neyman
Дата:

 

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Carlo
Sent: Monday, October 05, 2015 11:11 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] One long transaction or multiple short transactions?

 

We have a system which is constantly importing flat file data feeds into normalized tables in a DB warehouse over 10-20 connections. Each data feed row results in a single transaction of multiple single row writes to multiple normalized tables.

 

The more columns in the feed row, the more write operations, longer the transaction.

 

Operators are noticing that splitting a single feed of say – 100 columns – into two consecutive feeds of 50 columns improves performance dramatically. I am wondering whether the multi-threaded and very busy import environment causes non-linear performance degradation for longer transactions. Would the operators be advised to rewrite the feeds to result in more smaller transactions rather than fewer, longer ones?

 

Carlo

 

 

Ø  over 10-20 connections

 

How many cores do you have on that machine?

Test if limiting number of simultaneous feeds, like bringing their number down to half of your normal connections has the same positive effect.

 

Regards,

Igor Neyman

Re: One long transaction or multiple short transactions?

От
"Carlo"
Дата:

>> How many cores do you have on that machine?

Test if limiting number of simultaneous feeds, like bringing their number down to half of your normal connections has the same positive effect.

<< 

 

I am told 32 cores on a LINUX VM. The operators have tried limiting the number of threads. They feel that the number of connections is optimal. However, under the same conditions they noticed a sizable boost in performance if the same import was split into two successive imports which had shorter transactions.

 

I am just looking to see if there is any reason to think that lock contention (or anything else) over longer vs. shorter single-row-write transactions under the same conditions might explain this.

 

Carlo

 

From: Igor Neyman [mailto:ineyman@perceptron.com]
Sent: October 6, 2015 9:10 AM
To: Carlo; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] One long transaction or multiple short transactions?

 

 

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Carlo
Sent: Monday, October 05, 2015 11:11 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] One long transaction or multiple short transactions?

 

We have a system which is constantly importing flat file data feeds into normalized tables in a DB warehouse over 10-20 connections. Each data feed row results in a single transaction of multiple single row writes to multiple normalized tables.

 

The more columns in the feed row, the more write operations, longer the transaction.

 

Operators are noticing that splitting a single feed of say – 100 columns – into two consecutive feeds of 50 columns improves performance dramatically. I am wondering whether the multi-threaded and very busy import environment causes non-linear performance degradation for longer transactions. Would the operators be advised to rewrite the feeds to result in more smaller transactions rather than fewer, longer ones?

 

Carlo

 

 

Ø  over 10-20 connections

 

How many cores do you have on that machine?

Test if limiting number of simultaneous feeds, like bringing their number down to half of your normal connections has the same positive effect.

 

Regards,

Igor Neyman

Re: One long transaction or multiple short transactions?

От
"Graeme B. Bell"
Дата:
Sounds like a locking problem, but assuming you aren’t sherlock holmes and simply want to get the thing working as soon
aspossible:  

Stick a fast SSD in there (whether you stay on VM or physical). If you have enough I/O, you may be able to solve the
problemwith brute force. 
SSDs are a lot cheaper than your time.

Suggest you forward this to your operators: a talk I have about optimising multi-threaded work in postgres:

  http://graemebell.net/foss4gcomo.pdf     (Slides: “Input/Output” in the middle of the talk and also the slides at the
endlabelled “For Techies") 

Graeme Bell

p.s. You mentioned a VM. Consider making the machine physical and not VM. You’ll get a performance boost and remove the
riskof DB corruption from untrustworthy VM fsyncs. One day there will be a power cut or O/S crash during these your
writesand with a VM you’ve a reasonable chance of nuking your DB because VM virtualised storage often doesn’t honour
fsync(for performance reasons), but it’s fundamental to correct operation of PG.  



> On 08 Oct 2015, at 01:40, Carlo <reg01@stonebanks.ca> wrote:
>
>
> I am told 32 cores on a LINUX VM. The operators have tried limiting the number of threads. They feel that the number
ofconnections is optimal. However, under the same conditions they noticed a sizable boost in performance if the same
importwas split into two successive imports which had shorter transactions. 
>



Re: One long transaction or multiple short transactions?

От
"Carlo"
Дата:
>> Sounds like a locking problem

This is what I am trying to get at. The reason that I am not addressing
hardware or OS configuration concerns is that this is not my environment,
but my client's. The client is running my import software and has a choice
of how long the transactions can be. They are going for long transactions,
and I am trying to determine whether there is a penalty for single long
transactions over a configuration which would allow for more successive
short transactions. (keep in mind all reads and writes are single-row).

There are other people working on hardware and OS configuration, and that's
why I can't want to get into a general optimization discussion because the
client is concerned with just this question.

-----Original Message-----
From: Graeme B. Bell [mailto:graeme.bell@nibio.no]
Sent: October 8, 2015 4:55 AM
To: Carlo
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] One long transaction or multiple short transactions?

Sounds like a locking problem, but assuming you aren't sherlock holmes and
simply want to get the thing working as soon as possible:

Stick a fast SSD in there (whether you stay on VM or physical). If you have
enough I/O, you may be able to solve the problem with brute force.
SSDs are a lot cheaper than your time.

Suggest you forward this to your operators: a talk I have about optimising
multi-threaded work in postgres:

  http://graemebell.net/foss4gcomo.pdf     (Slides: "Input/Output" in the
middle of the talk and also the slides at the end labelled "For Techies")

Graeme Bell

p.s. You mentioned a VM. Consider making the machine physical and not VM.
You'll get a performance boost and remove the risk of DB corruption from
untrustworthy VM fsyncs. One day there will be a power cut or O/S crash
during these your writes and with a VM you've a reasonable chance of nuking
your DB because VM virtualised storage often doesn't honour fsync (for
performance reasons), but it's fundamental to correct operation of PG.



> On 08 Oct 2015, at 01:40, Carlo <reg01@stonebanks.ca> wrote:
>
>
> I am told 32 cores on a LINUX VM. The operators have tried limiting the
number of threads. They feel that the number of connections is optimal.
However, under the same conditions they noticed a sizable boost in
performance if the same import was split into two successive imports which
had shorter transactions.
>




Re: One long transaction or multiple short transactions?

От
"ktm@rice.edu"
Дата:
On Thu, Oct 08, 2015 at 11:08:55AM -0400, Carlo wrote:
> >> Sounds like a locking problem
>
> This is what I am trying to get at. The reason that I am not addressing
> hardware or OS configuration concerns is that this is not my environment,
> but my client's. The client is running my import software and has a choice
> of how long the transactions can be. They are going for long transactions,
> and I am trying to determine whether there is a penalty for single long
> transactions over a configuration which would allow for more successive
> short transactions. (keep in mind all reads and writes are single-row).
>
> There are other people working on hardware and OS configuration, and that's
> why I can't want to get into a general optimization discussion because the
> client is concerned with just this question.
>

Hi Carlo,

Since the read/writes are basically independent, which is what I take your
"single-row" comment to mean, by batching them you are balancing two
opposing factors. First, larger batches allow you to consolodate I/O and
other resource requests to make them more efficient per row. Second, larger
batches  require more locking as the number of rows updated grows. It may
very well be the case that by halving your batch size that the system can
process them more quickly than a single batch that is twice the size.

Regards,
Ken


Re: One long transaction or multiple short transactions?

От
"Carlo"
Дата:
-----Original Message-----
From: ktm@rice.edu [mailto:ktm@rice.edu]
Sent: October 8, 2015 1:00 PM
To: Carlo
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] One long transaction or multiple short transactions?

On Thu, Oct 08, 2015 at 11:08:55AM -0400, Carlo wrote:
> >> Sounds like a locking problem
>
> This is what I am trying to get at. The reason that I am not
> addressing hardware or OS configuration concerns is that this is not
> my environment, but my client's. The client is running my import
> software and has a choice of how long the transactions can be. They
> are going for long transactions, and I am trying to determine whether
> there is a penalty for single long transactions over a configuration
> which would allow for more successive short transactions. (keep in mind
all reads and writes are single-row).
>
> There are other people working on hardware and OS configuration, and
> that's why I can't want to get into a general optimization discussion
> because the client is concerned with just this question.
>

On October 8, 2015 1:00 PM Ken wrote:
> Hi Carlo,

> Since the read/writes are basically independent, which is what I take your
"single-row" comment to mean, by batching them you are balancing two
> opposing factors. First, larger batches allow you to consolodate I/O and
other resource requests to make them more efficient per row. Second, larger
> batches  require more locking as the number of rows updated grows. It may
very well be the case that by halving your batch size that the system can
> process them more quickly than a single batch that is twice the size.

Just to clarify, one transaction of this type may take longer to commit than
two successive transactions of half the size?



Re: One long transaction or multiple short transactions?

От
"ktm@rice.edu"
Дата:
On Thu, Oct 08, 2015 at 05:43:11PM -0400, Carlo wrote:
> -----Original Message-----
> From: ktm@rice.edu [mailto:ktm@rice.edu]
> Sent: October 8, 2015 1:00 PM
> To: Carlo
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] One long transaction or multiple short transactions?
>
> On Thu, Oct 08, 2015 at 11:08:55AM -0400, Carlo wrote:
> > >> Sounds like a locking problem
> >
> > This is what I am trying to get at. The reason that I am not
> > addressing hardware or OS configuration concerns is that this is not
> > my environment, but my client's. The client is running my import
> > software and has a choice of how long the transactions can be. They
> > are going for long transactions, and I am trying to determine whether
> > there is a penalty for single long transactions over a configuration
> > which would allow for more successive short transactions. (keep in mind
> all reads and writes are single-row).
> >
> > There are other people working on hardware and OS configuration, and
> > that's why I can't want to get into a general optimization discussion
> > because the client is concerned with just this question.
> >
>
> On October 8, 2015 1:00 PM Ken wrote:
> > Hi Carlo,
>
> > Since the read/writes are basically independent, which is what I take your
> "single-row" comment to mean, by batching them you are balancing two
> > opposing factors. First, larger batches allow you to consolodate I/O and
> other resource requests to make them more efficient per row. Second, larger
> > batches  require more locking as the number of rows updated grows. It may
> very well be the case that by halving your batch size that the system can
> > process them more quickly than a single batch that is twice the size.
>
> Just to clarify, one transaction of this type may take longer to commit than
> two successive transactions of half the size?
>

Yes, but where the optimum count is located should be determined by testing.
Just varying the batch size and note where the performance is at a maximum.

Regards,
Ken


Re: One long transaction or multiple short transactions?

От
Laurent Martelli
Дата:

Le 08/10/2015 01:40, Carlo a écrit :

>> How many cores do you have on that machine?

Test if limiting number of simultaneous feeds, like bringing their number down to half of your normal connections has the same positive effect.

<< 

 

I am told 32 cores on a LINUX VM. The operators have tried limiting the number of threads. They feel that the number of connections is optimal. However, under the same conditions they noticed a sizable boost in performance if the same import was split into two successive imports which had shorter transactions.

 

I am just looking to see if there is any reason to think that lock contention (or anything else) over longer vs. shorter single-row-write transactions under the same conditions might explain this.

I don't think inserts can cause contention on the server. Insert do not lock tables during the transaction. You may have contention on sequence but it won't vary with transaction size.

Have you checked the resource usage (CPU,memory) on the client side ?

How do you insert rows ? Do you use plain postgres API ?

Regards,
Laurent

Re: One long transaction or multiple short transactions?

От
"Graeme B. Bell"
Дата:
> I don't think inserts can cause contention on the server. Insert do not lock tables during the transaction. You may
havecontention on sequence but it won't vary with transaction size. 

Perhaps there could be a trigger on inserts which creates some lock contention?




Re: One long transaction or multiple short transactions?

От
Jim Nasby
Дата:
On 10/9/15 3:33 AM, Graeme B. Bell wrote:
>
>> I don't think inserts can cause contention on the server. Insert do not lock tables during the transaction. You may
havecontention on sequence but it won't vary with transaction size. 
>
> Perhaps there could be a trigger on inserts which creates some lock contention?

Except inserts *do* take a lot of locks, just not user-level locks.
Operations like finding a page to insert into, seeing if that page is in
shared buffers, loading the page into shared buffers, modifying a shared
buffer, getting the relation extension lock if you need to add a new
page. Then there's a whole pile of additional locking you could be
looking at for inserting into any indexes.

Now, most of the locks I described up there are transaction-aware, but
there's other things happening at a transaction level that could alter
that locking. So it wouldn't surprise me if you're seeing radically
different behavior based on transaction duration.

Also, it sounds like perhaps longer transactions are involving more
tables? Is this a star schema you're dealing with?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: One long transaction or multiple short transactions?

От
Andres Freund
Дата:
On 2015-10-17 10:26:01 -0500, Jim Nasby wrote:
> Except inserts *do* take a lot of locks, just not user-level locks.
> Operations like finding a page to insert into, seeing if that page is in
> shared buffers, loading the page into shared buffers, modifying a shared
> buffer, getting the relation extension lock if you need to add a new page.
> Then there's a whole pile of additional locking you could be looking at for
> inserting into any indexes.
>
> Now, most of the locks I described up there are transaction-aware

Missing *not*?


Re: One long transaction or multiple short transactions?

От
Jim Nasby
Дата:
On 10/17/15 12:13 PM, Andres Freund wrote:
> On 2015-10-17 10:26:01 -0500, Jim Nasby wrote:
>> Except inserts *do* take a lot of locks, just not user-level locks.
>> Operations like finding a page to insert into, seeing if that page is in
>> shared buffers, loading the page into shared buffers, modifying a shared
>> buffer, getting the relation extension lock if you need to add a new page.
>> Then there's a whole pile of additional locking you could be looking at for
>> inserting into any indexes.
>>
>> Now, most of the locks I described up there are transaction-aware
>
> Missing *not*?

Oops. Yes, they're *not* transaction-aware.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com