Обсуждение: Performance on Bulk Insert to Partitioned Table

Поиск

Список

Период

Сортировка

Performance on Bulk Insert to Partitioned Table

От

Charles Gomes

Дата:

20 декабря 2012 г., 20:29:26

Hello guys

 

I’m doing 1.2 Billion inserts into a table partitioned in
15.

 

When I target the MASTER table on all the inserts and let
the trigger decide what partition to choose from it takes 4 hours.

If I target the partitioned table directly during the
insert I can get 4 times better performance. It takes 1 hour.

 

 

I’m trying to get more performance while still using the
trigger to choose the table, so partitions can be changed without changing the
application that inserts the data.

 

What I noticed that iostat is not showing an I/O bottle
neck.

iostat –xN 1

Device:        
rrqm/s   wrqm/s     r/s    
w/s   rsec/s   wsec/s avgrq-sz avgqu-sz  
await  svctm  %util

Pgresql--data    
0.00     0.00    0.00
8288.00     0.00 66304.00    
8.00    60.92    7.35  
0.01   4.30

 

iostat –m 1

Device:           
tps    MB_read/s    MB_wrtn/s   
MB_read    MB_wrtn

dm-2          
4096.00        
0.00       
16.00         
0         16

 

I also don’t see a CPU bottleneck or context switching
bottle neck.

 

Postgresql does not seem to write more than 16MB/s or 4K
transactions per second unless I target each individual partition.

 

Did anybody have done some studies on partitioning bulk
insert performance?

 

Any suggestions on a way to accelerate it ?

 

 

Running pgsql 9.2.2 on RHEL 6.3

 

My trigger is pretty straight forward:

CREATE OR REPLACE FUNCTION quotes_insert_trigger()

RETURNS trigger AS $$

DECLARE

tablename varchar(24);

bdate varchar(10);

edate varchar(10);

BEGIN

tablename = 'quotes_' ||
to_char(new.received_time,'YYYY_MM_DD');

EXECUTE 'INSERT INTO '|| tablename ||' VALUES (($1).*)'
USING NEW ;

RETURN NULL;

END;

$$

LANGUAGE plpgsql;

 

CREATE TRIGGER quotes_insert_trigger

BEFORE INSERT ON quotes

FOR EACH ROW EXECUTE PROCEDURE quotes_insert_trigger();

 

 

Thanks

Charles

Re: Performance on Bulk Insert to Partitioned Table

От

Scott Marlowe

Дата:

20 декабря 2012 г., 20:39:34

On Thu, Dec 20, 2012 at 10:29 AM, Charles Gomes <charlesrg@outlook.com> wrote:
> Hello guys
>
> I’m doing 1.2 Billion inserts into a table partitioned in
> 15.
>
> When I target the MASTER table on all the inserts and let
> the trigger decide what partition to choose from it takes 4 hours.
>
> If I target the partitioned table directly during the
> insert I can get 4 times better performance. It takes 1 hour.
>
> I’m trying to get more performance while still using the
> trigger to choose the table, so partitions can be changed without changing the
> application that inserts the data.
>
> What I noticed that iostat is not showing an I/O bottle
> neck.

SNIP

> I also don’t see a CPU bottleneck or context switching
> bottle neck.

Are you sure? How are you measuring CPU usage?  If you've got > 1
core, you might need to look at individual cores in which case you
should see a single core maxed out.

Without writing your trigger in C you're not likely to do much better
than you're doing now.

Re: Performance on Bulk Insert to Partitioned Table

От

Charles Gomes

Дата:

20 декабря 2012 г., 21:55:36

None of the cores went to 100%. Looking at top during the inserts I can see several cores working, but never more than
60%busy. The machine has 8 cores (16 in HT). 
 The load is spread through the cores, didn't have a single maxed out. However with HT on, technically it is
overloaded.

 top - 13:14:07 up 7 days,  3:10,  3 users,  load average: 0.25, 0.12, 0.10
 Tasks: 871 total,  13 running, 858 sleeping,   0 stopped,   0 zombie
 Cpu(s): 60.6%us,  5.0%sy,  0.0%ni, 34.1%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
 Mem:  49282716k total,  9311612k used, 39971104k free,   231116k buffers
 Swap: 44354416k total,   171308k used, 44183108k free,  2439608k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 21832 postgres  20   0 22.7g  93m  90m S   15  0.2   0:19.91 postgres: cgomes historical_ticks 10.254.109.10(44093)
COPY
 21817 postgres  20   0 22.7g  92m  89m S   15  0.2   0:20.24 postgres: cgomes historical_ticks 10.254.109.10(44077)
idle
 21842 postgres  20   0 22.7g  96m  93m S   15  0.2   0:20.39 postgres: cgomes historical_ticks 10.254.109.10(44103)
COPY
 21792 postgres  20   0 22.7g  93m  90m R   15  0.2   0:20.34 postgres: cgomes historical_ticks 10.254.109.10(44045)
COPY
 21793 postgres  20   0 22.7g  90m  88m S   15  0.2   0:20.13 postgres: cgomes historical_ticks 10.254.109.10(44048)
COPY
 21806 postgres  20   0 22.7g  94m  91m S   15  0.2   0:20.14 postgres: cgomes historical_ticks 10.254.109.10(44066)
COPY
 21809 postgres  20   0 22.7g  92m  89m S   15  0.2   0:19.82 postgres: cgomes historical_ticks 10.254.109.10(44069)
COPY
 21813 postgres  20   0 22.7g  92m  89m S   15  0.2   0:19.98 postgres: cgomes historical_ticks 10.254.109.10(44073)
COPY
 21843 postgres  20   0 22.7g  95m  92m S   15  0.2   0:20.56 postgres: cgomes historical_ticks 10.254.109.10(44104)
COPY
 21854 postgres  20   0 22.7g  91m  88m S   15  0.2   0:20.08 postgres: cgomes historical_ticks 10.254.109.10(44114)
COPY
 21796 postgres  20   0 22.7g  89m  86m S   14  0.2   0:20.03 postgres: cgomes historical_ticks 10.254.109.10(44056)
COPY
 21797 postgres  20   0 22.7g  92m  90m R   14  0.2   0:20.18 postgres: cgomes historical_ticks 10.254.109.10(44057)
COPY
 21804 postgres  20   0 22.7g  95m  92m S   14  0.2   0:20.28 postgres: cgomes historical_ticks 10.254.109.10(44064)
COPY
 21807 postgres  20   0 22.7g  94m  91m S   14  0.2   0:20.15 postgres: cgomes historical_ticks 10.254.109.10(44067)
COPY
 21808 postgres  20   0 22.7g  92m  89m S   14  0.2   0:20.05 postgres: cgomes historical_ticks 10.254.109.10(44068)
COPY
 21815 postgres  20   0 22.7g  90m  88m S   14  0.2   0:20.13 postgres: cgomes historical_ticks 10.254.109.10(44075)
COPY
 21818 postgres  20   0 22.7g  91m  88m S   14  0.2   0:20.01 postgres: cgomes historical_ticks 10.254.109.10(44078)
COPY
 21825 postgres  20   0 22.7g  92m  89m S   14  0.2   0:20.00 postgres: cgomes historical_ticks 10.254.109.10(44085)
COPY
 21836 postgres  20   0 22.7g  91m  88m R   14  0.2   0:20.22 postgres: cgomes historical_ticks 10.254.109.10(44097)
COPY
 21857 postgres  20   0 22.7g  89m  86m R   14  0.2   0:19.92 postgres: cgomes historical_ticks 10.254.109.10(44118)
COPY
 21858 postgres  20   0 22.7g  95m  93m S   14  0.2   0:20.36 postgres: cgomes historical_ticks 10.254.109.10(44119)
COPY
 21789 postgres  20   0 22.7g  92m  89m S   14  0.2   0:20.05 postgres: cgomes historical_ticks 10.254.109.10(44044)
COPY
 21795 postgres  20   0 22.7g  93m  90m S   14  0.2   0:20.27 postgres: cgomes historical_ticks 10.254.109.10(44055)
COPY
 21798 postgres  20   0 22.7g  89m  86m S   14  0.2   0:20.06 postgres: cgomes historical_ticks 10.254.109.10(44058)
COPY
 21800 postgres  20   0 22.7g  93m  90m S   14  0.2   0:20.04 postgres: cgomes historical_ticks 10.254.109.10(44060)
COPY
 21802 postgres  20   0 22.7g  89m  87m S   14  0.2   0:20.10 postgres: cgomes historical_ticks 10.254.109.10(44062)
COPY


 Looks like I will have to disable HT.


 I've been looking at converting the trigger to C, but could not find
 a good example trigger for partitions written in C to start from. Have
 you heard of anyone implementing the partitioning trigger in C ?

----------------------------------------
> Date: Thu, 20 Dec 2012 10:39:25 -0700
> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> From: scott.marlowe@gmail.com
> To: charlesrg@outlook.com
> CC: pgsql-performance@postgresql.org
>
> On Thu, Dec 20, 2012 at 10:29 AM, Charles Gomes <charlesrg@outlook.com> wrote:
> > Hello guys
> >
> > I’m doing 1.2 Billion inserts into a table partitioned in
> > 15.
> >
> > When I target the MASTER table on all the inserts and let
> > the trigger decide what partition to choose from it takes 4 hours.
> >
> > If I target the partitioned table directly during the
> > insert I can get 4 times better performance. It takes 1 hour.
> >
> > I’m trying to get more performance while still using the
> > trigger to choose the table, so partitions can be changed without changing the
> > application that inserts the data.
> >
> > What I noticed that iostat is not showing an I/O bottle
> > neck.
>
> SNIP
>
> > I also don’t see a CPU bottleneck or context switching
> > bottle neck.
>
> Are you sure? How are you measuring CPU usage? If you've got > 1
> core, you might need to look at individual cores in which case you
> should see a single core maxed out.
>
> Without writing your trigger in C you're not likely to do much better
> than you're doing now.
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance on Bulk Insert to Partitioned Table

От

Stephen Frost

Дата:

20 декабря 2012 г., 23:02:38

Charles,

* Charles Gomes (charlesrg@outlook.com) wrote:
> I’m doing 1.2 Billion inserts into a table partitioned in
> 15.

Do you end up having multiple threads writing to the same, underlying,
tables..?  If so, I've seen that problem before.  Look at pg_locks while
things are running and see if there are 'extend' locks that aren't being
immediately granted.

Basically, there's a lock that PG has on a per-relation basis to extend
the relation (by a mere 8K..) which will block other writers.  If
there's a lot of contention around that lock, you'll get poor
performance and it'll be faster to have independent threads writing
directly to the underlying tables.  I doubt rewriting the trigger in C
will help if the problem is the extent lock.

If you do get this working well, I'd love to hear what you did to
accomplish that.  Note also that you can get bottle-necked on the WAL
data, unless you've taken steps to avoid that WAL.

    Thanks,

        Stephen

Вложения

signature.asc

Re: Performance on Bulk Insert to Partitioned Table

От

Charles Gomes

Дата:

20 декабря 2012 г., 23:08:41

Without hyperthreading CPU still not a bottleneck, while I/O is only 10% utilization.

top - 14:55:01 up 27 min,  2 users,  load average: 0.17, 0.19, 0.14
Tasks: 614 total,  17 running, 597 sleeping,   0 stopped,   0 zombie
Cpu(s): 73.8%us,  4.3%sy,  0.0%ni, 21.6%id,  0.1%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  49282716k total,  5855492k used, 43427224k free,    37400k buffers
Swap: 44354416k total,        0k used, 44354416k free,  1124900k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
19903 postgres  20   0 22.7g  34m  32m S  9.6  0.1   0:02.66 postgres: cgomes historical_ticks 10.254.109.10(46103)
COPY
19934 postgres  20   0 22.7g  34m  32m S  9.6  0.1   0:02.61 postgres: cgomes historical_ticks 10.254.109.10(46134)
COPY
19947 postgres  20   0 22.7g  34m  31m S  9.6  0.1   0:02.64 postgres: cgomes historical_ticks 10.254.109.10(46147)
COPY
19910 postgres  20   0 22.7g  34m  32m S  9.2  0.1   0:02.67 postgres: cgomes historical_ticks 10.254.109.10(46110)
COPY
19924 postgres  20   0 22.7g  33m  31m S  9.2  0.1   0:02.65 postgres: cgomes historical_ticks 10.254.109.10(46124)
COPY
19952 postgres  20   0 22.7g  34m  32m R  9.2  0.1   0:02.71 postgres: cgomes historical_ticks 10.254.109.10(46152)
COPY
19964 postgres  20   0 22.7g  34m  32m R  9.2  0.1   0:02.59 postgres: cgomes historical_ticks 10.254.109.10(46164)
COPY
19901 postgres  20   0 22.7g  35m  32m S  8.9  0.1   0:02.66 postgres: cgomes historical_ticks 10.254.109.10(46101)
COPY
19914 postgres  20   0 22.7g  34m  31m S  8.9  0.1   0:02.62 postgres: cgomes historical_ticks 10.254.109.10(46114)
COPY
19923 postgres  20   0 22.7g  34m  31m S  8.9  0.1   0:02.74 postgres: cgomes historical_ticks 10.254.109.10(46123)
COPY
19925 postgres  20   0 22.7g  34m  31m R  8.9  0.1   0:02.65 postgres: cgomes historical_ticks 10.254.109.10(46125)
COPY
19926 postgres  20   0 22.7g  34m  32m S  8.9  0.1   0:02.79 postgres: cgomes historical_ticks 10.254.109.10(46126)
COPY
19929 postgres  20   0 22.7g  34m  31m S  8.9  0.1   0:02.64 postgres: cgomes historical_ticks 10.254.109.10(46129)
COPY
19936 postgres  20   0 22.7g  34m  32m S  8.9  0.1   0:02.72 postgres: cgomes historical_ticks 10.254.109.10(46136)
COPY

I believe the bottleneck may be that pgsql has fight with it's siblings to update the indexes. Is there a way good way
toadd probes to check where things are slowing down ? 


----------------------------------------
> From: charlesrg@outlook.com
> To: scott.marlowe@gmail.com
> CC: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> Date: Thu, 20 Dec 2012 13:55:29 -0500
>
> None of the cores went to 100%. Looking at top during the inserts I can see several cores working, but never more
than60% busy. The machine has 8 cores (16 in HT). 
> The load is spread through the cores, didn't have a single maxed out. However with HT on, technically it is
overloaded.
>
> top - 13:14:07 up 7 days, 3:10, 3 users, load average: 0.25, 0.12, 0.10
> Tasks: 871 total, 13 running, 858 sleeping, 0 stopped, 0 zombie
> Cpu(s): 60.6%us, 5.0%sy, 0.0%ni, 34.1%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st
> Mem: 49282716k total, 9311612k used, 39971104k free, 231116k buffers
> Swap: 44354416k total, 171308k used, 44183108k free, 2439608k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 21832 postgres 20 0 22.7g 93m 90m S 15 0.2 0:19.91 postgres: cgomes historical_ticks 10.254.109.10(44093) COPY
> 21817 postgres 20 0 22.7g 92m 89m S 15 0.2 0:20.24 postgres: cgomes historical_ticks 10.254.109.10(44077) idle
> 21842 postgres 20 0 22.7g 96m 93m S 15 0.2 0:20.39 postgres: cgomes historical_ticks 10.254.109.10(44103) COPY
> 21792 postgres 20 0 22.7g 93m 90m R 15 0.2 0:20.34 postgres: cgomes historical_ticks 10.254.109.10(44045) COPY
> 21793 postgres 20 0 22.7g 90m 88m S 15 0.2 0:20.13 postgres: cgomes historical_ticks 10.254.109.10(44048) COPY
> 21806 postgres 20 0 22.7g 94m 91m S 15 0.2 0:20.14 postgres: cgomes historical_ticks 10.254.109.10(44066) COPY
> 21809 postgres 20 0 22.7g 92m 89m S 15 0.2 0:19.82 postgres: cgomes historical_ticks 10.254.109.10(44069) COPY
> 21813 postgres 20 0 22.7g 92m 89m S 15 0.2 0:19.98 postgres: cgomes historical_ticks 10.254.109.10(44073) COPY
> 21843 postgres 20 0 22.7g 95m 92m S 15 0.2 0:20.56 postgres: cgomes historical_ticks 10.254.109.10(44104) COPY
> 21854 postgres 20 0 22.7g 91m 88m S 15 0.2 0:20.08 postgres: cgomes historical_ticks 10.254.109.10(44114) COPY
> 21796 postgres 20 0 22.7g 89m 86m S 14 0.2 0:20.03 postgres: cgomes historical_ticks 10.254.109.10(44056) COPY
> 21797 postgres 20 0 22.7g 92m 90m R 14 0.2 0:20.18 postgres: cgomes historical_ticks 10.254.109.10(44057) COPY
> 21804 postgres 20 0 22.7g 95m 92m S 14 0.2 0:20.28 postgres: cgomes historical_ticks 10.254.109.10(44064) COPY
> 21807 postgres 20 0 22.7g 94m 91m S 14 0.2 0:20.15 postgres: cgomes historical_ticks 10.254.109.10(44067) COPY
> 21808 postgres 20 0 22.7g 92m 89m S 14 0.2 0:20.05 postgres: cgomes historical_ticks 10.254.109.10(44068) COPY
> 21815 postgres 20 0 22.7g 90m 88m S 14 0.2 0:20.13 postgres: cgomes historical_ticks 10.254.109.10(44075) COPY
> 21818 postgres 20 0 22.7g 91m 88m S 14 0.2 0:20.01 postgres: cgomes historical_ticks 10.254.109.10(44078) COPY
> 21825 postgres 20 0 22.7g 92m 89m S 14 0.2 0:20.00 postgres: cgomes historical_ticks 10.254.109.10(44085) COPY
> 21836 postgres 20 0 22.7g 91m 88m R 14 0.2 0:20.22 postgres: cgomes historical_ticks 10.254.109.10(44097) COPY
> 21857 postgres 20 0 22.7g 89m 86m R 14 0.2 0:19.92 postgres: cgomes historical_ticks 10.254.109.10(44118) COPY
> 21858 postgres 20 0 22.7g 95m 93m S 14 0.2 0:20.36 postgres: cgomes historical_ticks 10.254.109.10(44119) COPY
> 21789 postgres 20 0 22.7g 92m 89m S 14 0.2 0:20.05 postgres: cgomes historical_ticks 10.254.109.10(44044) COPY
> 21795 postgres 20 0 22.7g 93m 90m S 14 0.2 0:20.27 postgres: cgomes historical_ticks 10.254.109.10(44055) COPY
> 21798 postgres 20 0 22.7g 89m 86m S 14 0.2 0:20.06 postgres: cgomes historical_ticks 10.254.109.10(44058) COPY
> 21800 postgres 20 0 22.7g 93m 90m S 14 0.2 0:20.04 postgres: cgomes historical_ticks 10.254.109.10(44060) COPY
> 21802 postgres 20 0 22.7g 89m 87m S 14 0.2 0:20.10 postgres: cgomes historical_ticks 10.254.109.10(44062) COPY
>
>
> Looks like I will have to disable HT.
>
>
> I've been looking at converting the trigger to C, but could not find
> a good example trigger for partitions written in C to start from. Have
> you heard of anyone implementing the partitioning trigger in C ?
>
> ----------------------------------------
> > Date: Thu, 20 Dec 2012 10:39:25 -0700
> > Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> > From: scott.marlowe@gmail.com
> > To: charlesrg@outlook.com
> > CC: pgsql-performance@postgresql.org
> >
> > On Thu, Dec 20, 2012 at 10:29 AM, Charles Gomes <charlesrg@outlook.com> wrote:
> > > Hello guys
> > >
> > > I’m doing 1.2 Billion inserts into a table partitioned in
> > > 15.
> > >
> > > When I target the MASTER table on all the inserts and let
> > > the trigger decide what partition to choose from it takes 4 hours.
> > >
> > > If I target the partitioned table directly during the
> > > insert I can get 4 times better performance. It takes 1 hour.
> > >
> > > I’m trying to get more performance while still using the
> > > trigger to choose the table, so partitions can be changed without changing the
> > > application that inserts the data.
> > >
> > > What I noticed that iostat is not showing an I/O bottle
> > > neck.
> >
> > SNIP
> >
> > > I also don’t see a CPU bottleneck or context switching
> > > bottle neck.
> >
> > Are you sure? How are you measuring CPU usage? If you've got > 1
> > core, you might need to look at individual cores in which case you
> > should see a single core maxed out.
> >
> > Without writing your trigger in C you're not likely to do much better
> > than you're doing now.
> >
> >
> > --
> > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-performance
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance on Bulk Insert to Partitioned Table

От

Jeff Janes

Дата:

21 декабря 2012 г., 01:31:48

On Thu, Dec 20, 2012 at 9:29 AM, Charles Gomes <charlesrg@outlook.com> wrote:
> Hello guys
>
>
>
> I’m doing 1.2 Billion inserts into a table partitioned in
> 15.
>
>
>
> When I target the MASTER table on all the inserts and let
> the trigger decide what partition to choose from it takes 4 hours.
>
> If I target the partitioned table directly during the
> insert I can get 4 times better performance. It takes 1 hour.

How do you target them directly?  By implementing the
"trigger-equivalent-code" in the application code tuple by tuple, or
by pre-segregating the tuples and then bulk loading each segment to
its partition?

What if you get rid of the partitioning and just load data to the
master, is that closer to 4 hours or to 1 hour?

...
>
>
> What I noticed that iostat is not showing an I/O bottle
> neck.
>
> iostat –xN 1
>
> Device:
> rrqm/s   wrqm/s     r/s
> w/s   rsec/s   wsec/s avgrq-sz avgqu-sz
> await  svctm  %util
>
> Pgresql--data
> 0.00     0.00    0.00
> 8288.00     0.00 66304.00
> 8.00    60.92    7.35
> 0.01   4.30

8288 randomly scattered writes per second sound like enough to
bottleneck a pretty impressive RAID.  Or am I misreading that?

Cheers,

Jeff

Re: Performance on Bulk Insert to Partitioned Table

От

Charles Gomes

Дата:

21 декабря 2012 г., 01:43:26

Yes, I'm doing multiple threads inserting to the same tables.
I don't think the WAL is the issue as I even tried going ASYNC (non acid), disabled sync after writes, however still
didn'tgot able to push full performance. 

I've checked the locks and I see lots of ExclusiveLock's with:
select  * from pg_locks order by mode


   locktype    | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid |
virtualtransaction|  pid  |           mode           | granted | fastpath  

---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+--------------------------+---------+----------
 relation      |    16385 |    19295 |      |       |            |               |         |       |          |
72/18             | 19879 | AccessShareLock          | t       | t 
 relation      |    16385 |    11069 |      |       |            |               |         |       |          |
76/32             | 19881 | AccessShareLock          | t       | t 
 virtualxid    |          |          |      |       | 56/34      |               |         |       |          |
56/34             | 17952 | ExclusiveLock            | t       | t 
 virtualxid    |          |          |      |       | 27/33      |               |         |       |          |
27/33             | 17923 | ExclusiveLock            | t       | t 
 virtualxid    |          |          |      |       | 6/830      |               |         |       |          |
6/830             | 17902 | ExclusiveLock            | t       | t 
 virtualxid    |          |          |      |       | 62/34      |               |         |       |          |
62/34             | 17959 | ExclusiveLock            | t       | t 
 virtualxid    |          |          |      |       | 51/34      |               |         |       |          |
51/34             | 17947 | ExclusiveLock            | t       | t 
 virtualxid    |          |          |      |       | 36/34      |               |         |       |          |
36/34             | 17932 | ExclusiveLock            | t       | t 
 virtualxid    |          |          |      |       | 10/830     |               |         |       |          |
10/830            | 17906 |  
.................(about 56 of those)
ExclusiveLock            | t       | t
 transactionid |          |          |      |       |            |         30321 |         |       |          |
55/33             | 17951 | ExclusiveLock            | t       | f 
 transactionid |          |          |      |       |            |         30344 |         |       |          |
19/34             | 17912 | ExclusiveLock            | t       | f 
 transactionid |          |          |      |       |            |         30354 |         |       |          |
3/834             | 17898 | ExclusiveLock            | t       | f 
 transactionid |          |          |      |       |            |         30359 |         |       |          |
50/34             | 17946 | ExclusiveLock            | t       | f 
 transactionid |          |          |      |       |            |         30332 |         |       |          |
9/830             | 17905 | ExclusiveLock            | t       | f 
 transactionid |          |          |      |       |            |         30294 |         |       |          |
37/33             | 17933 | ExclusiveLock            | t       | f 
 transactionid |          |          |      |       |            |         30351 |         |       |          |
38/34             | 17934 | ExclusiveLock            | t       | f 
 transactionid |          |          |      |       |            |         30326 |         |       |          |
26/33             | 17922 | ExclusiveLock            | t       | f 
.................(about 52 of those)
 relation      |    16385 |    19291 |      |       |            |               |         |       |          |
72/18             | 19879 | ShareUpdateExclusiveLock | t       | f 
(3 of those)
 relation      |    16385 |    19313 |      |       |            |               |         |       |          |
33/758            | 17929 | RowExclusiveLock         | t       | t 
(211 of those)


However I don't see any of the EXTEND locks mentioned.

I would give a try translating the trigger to C but I can't code it without a good sample to start from, if anyone has
oneand would like to share I would love to start from it and share with other people so everyone can benefit. 

----------------------------------------
> Date: Thu, 20 Dec 2012 15:02:34 -0500
> From: sfrost@snowman.net
> To: charlesrg@outlook.com
> CC: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
>
> Charles,
>
> * Charles Gomes (charlesrg@outlook.com) wrote:
> > I’m doing 1.2 Billion inserts into a table partitioned in
> > 15.
>
> Do you end up having multiple threads writing to the same, underlying,
> tables..? If so, I've seen that problem before. Look at pg_locks while
> things are running and see if there are 'extend' locks that aren't being
> immediately granted.
>
> Basically, there's a lock that PG has on a per-relation basis to extend
> the relation (by a mere 8K..) which will block other writers. If
> there's a lot of contention around that lock, you'll get poor
> performance and it'll be faster to have independent threads writing
> directly to the underlying tables. I doubt rewriting the trigger in C
> will help if the problem is the extent lock.
>
> If you do get this working well, I'd love to hear what you did to
> accomplish that. Note also that you can get bottle-necked on the WAL
> data, unless you've taken steps to avoid that WAL.
>
> Thanks,
>
> Stephen

Re: Performance on Bulk Insert to Partitioned Table

От

Ondrej Ivanič

Дата:

21 декабря 2012 г., 01:50:56

Hi,

On 21 December 2012 04:29, Charles Gomes <charlesrg@outlook.com> wrote:
> When I target the MASTER table on all the inserts and let
> the trigger decide what partition to choose from it takes 4 hours.
>
> If I target the partitioned table directly during the
> insert I can get 4 times better performance. It takes 1 hour.

Yes, that's my experience as well. Triggers are the slowest.
Performance of "DO INSTEAD" rule is close to direct inserts but rule
setup is complex (each partition needs one):

 create or replace rule <master_table>_insert_<partition_name> as on
insert to <master_table>
                        where new.<part_column> >= ... and
new.<part_column> < ....
                        do instead
                        insert into <master_table>_<partition_name>
values (new.*)

The best is used to direct inserts (into partition) if you can.

--
Ondrej Ivanic
(http://www.linkedin.com/in/ondrejivanic)

Re: Performance on Bulk Insert to Partitioned Table

От

Charles Gomes

Дата:

21 декабря 2012 г., 01:56:32

Jeff,

The 8288 writes are fine, as the array has a BBU, it's fine. You see about 4% of the utilization.

To target directly instead of doing :
INSERT INTO TABLE VALUES ()
I use:
INSERT INTO TABLE_PARTITION_01 VALUES()

By targeting it I see a huge performance increase.

I haven't tested using 1Billion rows in a single table. The issue is that in the future it will grow to more than 1
billionrows, it will get to about 4Billion rows and that's when I believe partition would be a major improvement. 


----------------------------------------
> Date: Thu, 20 Dec 2012 14:31:44 -0800
> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> From: jeff.janes@gmail.com
> To: charlesrg@outlook.com
> CC: pgsql-performance@postgresql.org
>
> On Thu, Dec 20, 2012 at 9:29 AM, Charles Gomes <charlesrg@outlook.com> wrote:
> > Hello guys
> >
> >
> >
> > I’m doing 1.2 Billion inserts into a table partitioned in
> > 15.
> >
> >
> >
> > When I target the MASTER table on all the inserts and let
> > the trigger decide what partition to choose from it takes 4 hours.
> >
> > If I target the partitioned table directly during the
> > insert I can get 4 times better performance. It takes 1 hour.
>
> How do you target them directly? By implementing the
> "trigger-equivalent-code" in the application code tuple by tuple, or
> by pre-segregating the tuples and then bulk loading each segment to
> its partition?
>
> What if you get rid of the partitioning and just load data to the
> master, is that closer to 4 hours or to 1 hour?
>
> ...
> >
> >
> > What I noticed that iostat is not showing an I/O bottle
> > neck.
> >
> > iostat –xN 1
> >
> > Device:
> > rrqm/s wrqm/s r/s
> > w/s rsec/s wsec/s avgrq-sz avgqu-sz
> > await svctm %util
> >
> > Pgresql--data
> > 0.00 0.00 0.00
> > 8288.00 0.00 66304.00
> > 8.00 60.92 7.35
> > 0.01 4.30
>
> 8288 randomly scattered writes per second sound like enough to
> bottleneck a pretty impressive RAID. Or am I misreading that?
>
> Cheers,
>
> Jeff
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance on Bulk Insert to Partitioned Table

От

Charles Gomes

Дата:

21 декабря 2012 г., 01:59:27

True, that's the same I feel, I will be looking to translate the trigger to C if I can find good examples, that should
accelerate. 
Using rules would be totally bad as I'm partitioning daily and after one year having 365 lines of IF won't be fun to
maintain.

----------------------------------------
> Date: Fri, 21 Dec 2012 09:50:49 +1100
> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> From: ondrej.ivanic@gmail.com
> To: charlesrg@outlook.com
> CC: pgsql-performance@postgresql.org
>
> Hi,
>
> On 21 December 2012 04:29, Charles Gomes <charlesrg@outlook.com> wrote:
> > When I target the MASTER table on all the inserts and let
> > the trigger decide what partition to choose from it takes 4 hours.
> >
> > If I target the partitioned table directly during the
> > insert I can get 4 times better performance. It takes 1 hour.
>
> Yes, that's my experience as well. Triggers are the slowest.
> Performance of "DO INSTEAD" rule is close to direct inserts but rule
> setup is complex (each partition needs one):
>
> create or replace rule <master_table>_insert_<partition_name> as on
> insert to <master_table>
> where new.<part_column> >= ... and
> new.<part_column> < ....
> do instead
> insert into <master_table>_<partition_name>
> values (new.*)
>
> The best is used to direct inserts (into partition) if you can.
>
> --
> Ondrej Ivanic
> (http://www.linkedin.com/in/ondrejivanic)
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance on Bulk Insert to Partitioned Table

От

Tom Lane

Дата:

21 декабря 2012 г., 02:39:13

Charles Gomes <charlesrg@outlook.com> writes:
> Using rules would be totally bad as I'm partitioning daily and after one year having 365 lines of IF won't be fun to
maintain.

You should probably rethink that plan anyway.  The existing support for
partitioning is not meant to support hundreds of partitions; you're
going to be bleeding performance in a lot of places if you insist on
doing that.

            regards, tom lane

Re: Performance on Bulk Insert to Partitioned Table

От

Scott Marlowe

Дата:

21 декабря 2012 г., 03:19:55

On Thu, Dec 20, 2012 at 4:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Charles Gomes <charlesrg@outlook.com> writes:
>> Using rules would be totally bad as I'm partitioning daily and after one year having 365 lines of IF won't be fun to
maintain.
>
> You should probably rethink that plan anyway.  The existing support for
> partitioning is not meant to support hundreds of partitions; you're
> going to be bleeding performance in a lot of places if you insist on
> doing that.

A couple of points:

1: In my experience hundreds is OK performance wise, but as you
approach thousands you fall off a cliff, and performance is terrible.
So at the 3 to 4 year mark daily partition tables will definitely be
having problems.

2: A good way around this is to have partitions for the last x days,
last x weeks or months before that, and x years going however far
back.  This keeps the number of partitions low.   Just dump the oldest
day into a weekly partition, til the next week starts, then dump the
oldest week into monthly etc.  As long as you have lower traffic times
of day or enough bandwidth it works pretty well.  Or you can just use
daily partitions til things start going boom and fix it all at a later
date.  It's probably better to be proactive tho.

3: Someone above mentioned rules being faster than triggers.  In my
experience they're WAY slower than triggers but maybe that was just on
the older pg versions (8.3 and lower) we were doing this on.  I'd be
interested in seeing some benchmarks if rules have gotten faster or I
was just doing it wrong.

Re: Performance on Bulk Insert to Partitioned Table

От

Jeff Janes

Дата:

21 декабря 2012 г., 06:24:13

On Thursday, December 20, 2012, Charles Gomes wrote:

Jeff,

The 8288 writes are fine, as the array has a BBU, it's fine. You see about 4% of the utilization.

BBU is great for latency, but it doesn't do much for throughput, unless it is doing write combining behind the scenes. Is it HDD or SSD behind the BBU? Have you bench-marked it on randomly scattered 8k writes?

I've seen %util reports that were low while watching a strace showed obvious IO freezes. So I don't know how much faith to put into low %util.

To target directly instead of doing :
INSERT INTO TABLE VALUES ()
I use:
INSERT INTO TABLE_PARTITION_01 VALUES()

But how is it deciding what partition to use? Does it have to re-decide for every row, or does each thread serve only one partition throughout its life and so makes the decision only once?

By targeting it I see a huge performance increase.

But is that because by targeting you are by-passing the the over-head of triggers, or is it because you are loading the rows in an order which leads to more efficient index maintenance?

I haven't tested using 1Billion rows in a single table. The issue is that in the future it will grow to more than 1 billion rows, it will get to about 4Billion rows and that's when I believe partition would be a major improvement.

The way that partitioning gives you performance improvements is by you embracing the partitioning, for example by targeting the loading to just one partition without any indexes, creating indexes, and then atomically attaching it to the table. If you wish to have partitions, but want to use triggers to hide that partitioning from you, then I don't think you can expect to get much of a speed up through using partitions.

Any way, the way I would approach it would be to load to a single un-partitioned table, and also load to a single dummy-partitioned table which uses a trigger that looks like the one you want to use for real, but directs all rows to a single partition. If these loads take the same time, you know it is not the trigger which is limiting.

Cheers,

Jeff

Re: Performance on Bulk Insert to Partitioned Table

От

Charles Gomes

Дата:

21 декабря 2012 г., 17:14:50

Tom, I may have to rethink it, so I'm going to have about 100 Million rows per day (5 days a week) 2 Billion per month.
Mypoint on partitioning was to be able to store 6 months of data in a single machine. About 132 partitions in a total
of66 billion rows. 


----------------------------------------
> From: tgl@sss.pgh.pa.us
> To: charlesrg@outlook.com
> CC: ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> Date: Thu, 20 Dec 2012 18:39:07 -0500
>
> Charles Gomes <charlesrg@outlook.com> writes:
> > Using rules would be totally bad as I'm partitioning daily and after one year having 365 lines of IF won't be fun
tomaintain. 
>
> You should probably rethink that plan anyway. The existing support for
> partitioning is not meant to support hundreds of partitions; you're
> going to be bleeding performance in a lot of places if you insist on
> doing that.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance on Bulk Insert to Partitioned Table

От

Charles Gomes

Дата:

21 декабря 2012 г., 17:30:14

The BBU does combines the writes.

I've benchmarked using a single table and it took 1:34:21.549959 to insert 1188000000 rows. (70 writers to a single
table)

I've also benchmarked having writers targeting individual partitions and they get the same job done in 1 Hour.

I/O is definitely not the botleneck.

Without changing hardware it accelerates things almost 4 times, looks like to be a delay on the way Postgresql handles
thepartitions or the time taking for the trigger to select what partition to insert. 


When targeting I issue commands that insert directly into the partition "INSERT INTO quotes_DATE VALUES() ..,..,...,..,
"10k rows at time. 
When not targeting I leave to the trigger to decide:



CREATE OR REPLACE FUNCTION quotes_insert_trigger()RETURNS trigger AS $$

DECLARE

tablename varchar(24);

bdate varchar(10);

edate varchar(10);

BEGIN

tablename = 'quotes_' || to_char(new.received_time,'YYYY_MM_DD');

EXECUTE 'INSERT INTO '|| tablename ||' VALUES (($1).*)'
USING NEW ;

RETURN NULL;

END;

$$

LANGUAGE plpgsql;


Maybe translating this trigger to C could help. But I haven't heart anyone that did use partitioning with a trigger in
Cand I don't have the know how on it without examples. 

________________________________
> Date: Thu, 20 Dec 2012 19:24:09 -0800
> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> From: jeff.janes@gmail.com
> To: charlesrg@outlook.com
> CC: pgsql-performance@postgresql.org
>
>
>
> On Thursday, December 20, 2012, Charles Gomes wrote:
> Jeff,
>
> The 8288 writes are fine, as the array has a BBU, it's fine. You see
> about 4% of the utilization.
>
> BBU is great for latency, but it doesn't do much for throughput, unless
> it is doing write combining behind the scenes.  Is it HDD or SSD behind
> the BBU?  Have you bench-marked it on randomly scattered 8k writes?
>
> I've seen %util reports that were low while watching a strace showed
> obvious IO freezes.  So I don't know how much faith to put into low
> %util.
>
>
>
> To target directly instead of doing :
> INSERT INTO TABLE VALUES ()
> I use:
> INSERT INTO TABLE_PARTITION_01 VALUES()
>
> But how is it deciding what partition to use?  Does it have to
> re-decide for every row, or does each thread serve only one partition
> throughout its life and so makes the decision only once?
>
>
>
> By targeting it I see a huge performance increase.
>
> But is that because by targeting you are by-passing the the over-head
> of triggers, or is it because you are loading the rows in an order
> which leads to more efficient index maintenance?
>
>
> I haven't tested using 1Billion rows in a single table. The issue is
> that in the future it will grow to more than 1 billion rows, it will
> get to about 4Billion rows and that's when I believe partition would be
> a major improvement.
>
> The way that partitioning gives you performance improvements is by you
> embracing the partitioning, for example by targeting the loading to
> just one partition without any indexes, creating indexes, and then
> atomically attaching it to the table.  If you wish to have partitions,
> but want to use triggers to hide that partitioning from you, then I
> don't think you can expect to get much of a speed up through using
> partitions.
>
> Any way, the way I would approach it would be to load to a single
> un-partitioned table, and also load to a single dummy-partitioned table
> which uses a trigger that looks like the one you want to use for real,
> but directs all rows to a single partition.  If these loads take the
> same time, you know it is not the trigger which is limiting.
>
> Cheers,
>
> Jeff

Re: Performance on Bulk Insert to Partitioned Table

От

Jeff Janes

Дата:

24 декабря 2012 г., 01:55:20

On Thursday, December 20, 2012, Charles Gomes wrote:

Without hyperthreading CPU still not a bottleneck, while I/O is only 10% utilization.

top - 14:55:01 up 27 min, 2 users, load average: 0.17, 0.19, 0.14
Tasks: 614 total, 17 running, 597 sleeping, 0 stopped, 0 zombie
Cpu(s): 73.8%us, 4.3%sy, 0.0%ni, 21.6%id, 0.1%wa, 0.0%hi, 0.1%si, 0.0%st

...

I believe the bottleneck may be that pgsql has fight with it's siblings to update the indexes.

I think that should mostly show up as idle or wait, not as user time. Since your at 75% user already, you couldn't expect more than a 33% improvement by fixing that, assuming that that were the problem.

> Is there a way good way to add probes to check where things are slowing down ?

What happens if you just drop the indexes as an experiment? That should put a hard limit on the amount the indexes can be slowing you down.

I like oprofile to give you good bottom level profile of where CPU time is going. Unfortunately, it will probably just show you "AllocSetAlloc". Also, I don't trust it on virtualized systems, if you are on one of those.

There are many other ways of approaching it, but mostly you have to already have a good theory about what is going on in order know which one to use or to interpret the results, and many of them require you to make custom compiles of the postgres server code.

Cheers,

Jeff

Re: Performance on Bulk Insert to Partitioned Table

От

Jeff Janes

Дата:

24 декабря 2012 г., 01:55:22

On Thursday, December 20, 2012, Charles Gomes wrote:

True, that's the same I feel, I will be looking to translate the trigger to C if I can find good examples, that should accelerate.

I think your performance bottleneck is almost certainly the dynamic SQL. Using C to generate that dynamic SQL isn't going to help much, because it is still the SQL engine that has to parse, plan, and execute it.

Are the vast majority if your inserts done on any given day for records from that same day or the one before; or are they evenly spread over the preceding year? If the former, you could use static SQL in IF and ELSIF for those days, and fall back on the dynamic SQL for the exceptions in the ELSE block. Of course that means you have to update the trigger every day.

Using rules would be totally bad as I'm partitioning daily and after one year having 365 lines of IF won't be fun to maintain.

Maintaining 365 lines of IF is what Perl was invented for. That goes for triggers w/ static SQL as well as for rules.

If you do the static SQL in a trigger and the dates of the records are evenly scattered over the preceding year, make sure your IFs are nested like a binary search, not a linear search. And if they are mostly for "today's" date, then make sure you search backwards.

Cheers,

Jeff

Re: Performance on Bulk Insert to Partitioned Table

От

Charles Gomes

Дата:

24 декабря 2012 г., 18:51:15

________________________________
> Date: Sun, 23 Dec 2012 14:55:16 -0800
> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> From: jeff.janes@gmail.com
> To: charlesrg@outlook.com
> CC: ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org
>
> On Thursday, December 20, 2012, Charles Gomes wrote:
> True, that's the same I feel, I will be looking to translate the
> trigger to C if I can find good examples, that should accelerate.
>
> I think your performance bottleneck is almost certainly the dynamic
> SQL.  Using C to generate that dynamic SQL isn't going to help much,
> because it is still the SQL engine that has to parse, plan, and execute
> it.
>
> Are the vast majority if your inserts done on any given day for records
> from that same day or the one before; or are they evenly spread over
> the preceding year?  If the former, you could use static SQL in IF and
> ELSIF for those days, and fall back on the dynamic SQL for the
> exceptions in the ELSE block.  Of course that means you have to update
> the trigger every day.
>
>
> Using rules would be totally bad as I'm partitioning daily and after
> one year having 365 lines of IF won't be fun to maintain.
>
> Maintaining 365 lines of IF is what Perl was invented for.  That goes
> for triggers w/ static SQL as well as for rules.
>
> If you do the static SQL in a trigger and the dates of the records are
> evenly scattered over the preceding year, make sure your IFs are nested
> like a binary search, not a linear search.  And if they are mostly for
> "today's" date, then make sure you search backwards.
>
> Cheers,
>
> Jeff

Jeff, I've changed the code from dynamic to:

CREATE OR REPLACE FUNCTION quotes_insert_trigger()
RETURNS trigger AS $$
DECLARE
r_date text;
BEGIN
r_date = to_char(new.received_time, 'YYYY_MM_DD');
case r_date
    when '2012_09_10' then
        insert into quotes_2012_09_10 values (NEW.*) using new;
        return;
    when '2012_09_11' then
        insert into quotes_2012_09_11 values (NEW.*) using new;
        return;
    when '2012_09_12' then
        insert into quotes_2012_09_12 values (NEW.*) using new;
        return;
    when '2012_09_13' then
        insert into quotes_2012_09_13 values (NEW.*) using new;
        return;
    when '2012_09_14' then
        insert into quotes_2012_09_14 values (NEW.*) using new;
        return;
    when '2012_09_15' then
        insert into quotes_2012_09_15 values (NEW.*) using new;
        return;
    when '2012_09_16' then
        insert into quotes_2012_09_16 values (NEW.*) using new;
        return;
    when '2012_09_17' then
        insert into quotes_2012_09_17 values (NEW.*) using new;
        return;
    when '2012_09_18' then
        insert into quotes_2012_09_18 values (NEW.*) using new;
        return;
    when '2012_09_19' then
        insert into quotes_2012_09_19 values (NEW.*) using new;
        return;
    when '2012_09_20' then
        insert into quotes_2012_09_20 values (NEW.*) using new;
        return;
    when '2012_09_21' then
        insert into quotes_2012_09_21 values (NEW.*) using new;
        return;
    when '2012_09_22' then
        insert into quotes_2012_09_22 values (NEW.*) using new;
        return;
    when '2012_09_23' then
        insert into quotes_2012_09_23 values (NEW.*) using new;
        return;
    when '2012_09_24' then
        insert into quotes_2012_09_24 values (NEW.*) using new;
        return;
end case
RETURN NULL;
END;
$$
LANGUAGE plpgsql;


However I've got no speed improvement.
I need to keep two weeks worth of partitions at a time, that's why all the WHEN statements.
Wish postgres could automate the partition process natively like the other sql db.

Thank you guys for your help.

Re: Performance on Bulk Insert to Partitioned Table

От

Charles Gomes

Дата:

24 декабря 2012 г., 20:07:19

By the way, I've just re-wrote the code to target the partitions individually and I've got almost 4 times improvement.
Shouldn't it be faster to process the trigger, I would understand if there was no CPU left, but there is lots of cpu to
chew.
It seems that there will be no other way to speedup unless the insert code is partition aware.

----------------------------------------
> From: charlesrg@outlook.com
> To: jeff.janes@gmail.com
> CC: ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> Date: Mon, 24 Dec 2012 10:51:12 -0500
>
> ________________________________
> > Date: Sun, 23 Dec 2012 14:55:16 -0800
> > Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> > From: jeff.janes@gmail.com
> > To: charlesrg@outlook.com
> > CC: ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org
> >
> > On Thursday, December 20, 2012, Charles Gomes wrote:
> > True, that's the same I feel, I will be looking to translate the
> > trigger to C if I can find good examples, that should accelerate.
> >
> > I think your performance bottleneck is almost certainly the dynamic
> > SQL. Using C to generate that dynamic SQL isn't going to help much,
> > because it is still the SQL engine that has to parse, plan, and execute
> > it.
> >
> > Are the vast majority if your inserts done on any given day for records
> > from that same day or the one before; or are they evenly spread over
> > the preceding year? If the former, you could use static SQL in IF and
> > ELSIF for those days, and fall back on the dynamic SQL for the
> > exceptions in the ELSE block. Of course that means you have to update
> > the trigger every day.
> >
> >
> > Using rules would be totally bad as I'm partitioning daily and after
> > one year having 365 lines of IF won't be fun to maintain.
> >
> > Maintaining 365 lines of IF is what Perl was invented for. That goes
> > for triggers w/ static SQL as well as for rules.
> >
> > If you do the static SQL in a trigger and the dates of the records are
> > evenly scattered over the preceding year, make sure your IFs are nested
> > like a binary search, not a linear search. And if they are mostly for
> > "today's" date, then make sure you search backwards.
> >
> > Cheers,
> >
> > Jeff
>
> Jeff, I've changed the code from dynamic to:
>
> CREATE OR REPLACE FUNCTION quotes_insert_trigger()
> RETURNS trigger AS $$
> DECLARE
> r_date text;
> BEGIN
> r_date = to_char(new.received_time, 'YYYY_MM_DD');
> case r_date
>     when '2012_09_10' then
>         insert into quotes_2012_09_10 values (NEW.*) using new;
>         return;
>     when '2012_09_11' then
>         insert into quotes_2012_09_11 values (NEW.*) using new;
>         return;
>     when '2012_09_12' then
>         insert into quotes_2012_09_12 values (NEW.*) using new;
>         return;
>     when '2012_09_13' then
>         insert into quotes_2012_09_13 values (NEW.*) using new;
>         return;
>     when '2012_09_14' then
>         insert into quotes_2012_09_14 values (NEW.*) using new;
>         return;
>     when '2012_09_15' then
>         insert into quotes_2012_09_15 values (NEW.*) using new;
>         return;
>     when '2012_09_16' then
>         insert into quotes_2012_09_16 values (NEW.*) using new;
>         return;
>     when '2012_09_17' then
>         insert into quotes_2012_09_17 values (NEW.*) using new;
>         return;
>     when '2012_09_18' then
>         insert into quotes_2012_09_18 values (NEW.*) using new;
>         return;
>     when '2012_09_19' then
>         insert into quotes_2012_09_19 values (NEW.*) using new;
>         return;
>     when '2012_09_20' then
>         insert into quotes_2012_09_20 values (NEW.*) using new;
>         return;
>     when '2012_09_21' then
>         insert into quotes_2012_09_21 values (NEW.*) using new;
>         return;
>     when '2012_09_22' then
>         insert into quotes_2012_09_22 values (NEW.*) using new;
>         return;
>     when '2012_09_23' then
>         insert into quotes_2012_09_23 values (NEW.*) using new;
>         return;
>     when '2012_09_24' then
>         insert into quotes_2012_09_24 values (NEW.*) using new;
>         return;
> end case
> RETURN NULL;
> END;
> $$
> LANGUAGE plpgsql;
>
>
> However I've got no speed improvement.
> I need to keep two weeks worth of partitions at a time, that's why all the WHEN statements.
> Wish postgres could automate the partition process natively like the other sql db.
>
> Thank you guys for your help.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance on Bulk Insert to Partitioned Table

От

Evgeny Shishkin

Дата:

24 декабря 2012 г., 20:11:13

On Dec 24, 2012, at 9:07 PM, Charles Gomes <charlesrg@outlook.com> wrote:

> By the way, I've just re-wrote the code to target the partitions individually and I've got almost 4 times
improvement.
> Shouldn't it be faster to process the trigger, I would understand if there was no CPU left, but there is lots of cpu
tochew. 

I saw your 20% idle cpu and raise eyebrows.

> It seems that there will be no other way to speedup unless the insert code is partition aware.
>
> ----------------------------------------
>> From: charlesrg@outlook.com
>> To: jeff.janes@gmail.com
>> CC: ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org
>> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
>> Date: Mon, 24 Dec 2012 10:51:12 -0500
>>
>> ________________________________
>>> Date: Sun, 23 Dec 2012 14:55:16 -0800
>>> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
>>> From: jeff.janes@gmail.com
>>> To: charlesrg@outlook.com
>>> CC: ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org
>>>
>>> On Thursday, December 20, 2012, Charles Gomes wrote:
>>> True, that's the same I feel, I will be looking to translate the
>>> trigger to C if I can find good examples, that should accelerate.
>>>
>>> I think your performance bottleneck is almost certainly the dynamic
>>> SQL. Using C to generate that dynamic SQL isn't going to help much,
>>> because it is still the SQL engine that has to parse, plan, and execute
>>> it.
>>>
>>> Are the vast majority if your inserts done on any given day for records
>>> from that same day or the one before; or are they evenly spread over
>>> the preceding year? If the former, you could use static SQL in IF and
>>> ELSIF for those days, and fall back on the dynamic SQL for the
>>> exceptions in the ELSE block. Of course that means you have to update
>>> the trigger every day.
>>>
>>>
>>> Using rules would be totally bad as I'm partitioning daily and after
>>> one year having 365 lines of IF won't be fun to maintain.
>>>
>>> Maintaining 365 lines of IF is what Perl was invented for. That goes
>>> for triggers w/ static SQL as well as for rules.
>>>
>>> If you do the static SQL in a trigger and the dates of the records are
>>> evenly scattered over the preceding year, make sure your IFs are nested
>>> like a binary search, not a linear search. And if they are mostly for
>>> "today's" date, then make sure you search backwards.
>>>
>>> Cheers,
>>>
>>> Jeff
>>
>> Jeff, I've changed the code from dynamic to:
>>
>> CREATE OR REPLACE FUNCTION quotes_insert_trigger()
>> RETURNS trigger AS $$
>> DECLARE
>> r_date text;
>> BEGIN
>> r_date = to_char(new.received_time, 'YYYY_MM_DD');
>> case r_date
>>    when '2012_09_10' then
>>        insert into quotes_2012_09_10 values (NEW.*) using new;
>>        return;
>>    when '2012_09_11' then
>>        insert into quotes_2012_09_11 values (NEW.*) using new;
>>        return;
>>    when '2012_09_12' then
>>        insert into quotes_2012_09_12 values (NEW.*) using new;
>>        return;
>>    when '2012_09_13' then
>>        insert into quotes_2012_09_13 values (NEW.*) using new;
>>        return;
>>    when '2012_09_14' then
>>        insert into quotes_2012_09_14 values (NEW.*) using new;
>>        return;
>>    when '2012_09_15' then
>>        insert into quotes_2012_09_15 values (NEW.*) using new;
>>        return;
>>    when '2012_09_16' then
>>        insert into quotes_2012_09_16 values (NEW.*) using new;
>>        return;
>>    when '2012_09_17' then
>>        insert into quotes_2012_09_17 values (NEW.*) using new;
>>        return;
>>    when '2012_09_18' then
>>        insert into quotes_2012_09_18 values (NEW.*) using new;
>>        return;
>>    when '2012_09_19' then
>>        insert into quotes_2012_09_19 values (NEW.*) using new;
>>        return;
>>    when '2012_09_20' then
>>        insert into quotes_2012_09_20 values (NEW.*) using new;
>>        return;
>>    when '2012_09_21' then
>>        insert into quotes_2012_09_21 values (NEW.*) using new;
>>        return;
>>    when '2012_09_22' then
>>        insert into quotes_2012_09_22 values (NEW.*) using new;
>>        return;
>>    when '2012_09_23' then
>>        insert into quotes_2012_09_23 values (NEW.*) using new;
>>        return;
>>    when '2012_09_24' then
>>        insert into quotes_2012_09_24 values (NEW.*) using new;
>>        return;
>> end case
>> RETURN NULL;
>> END;
>> $$
>> LANGUAGE plpgsql;
>>
>>
>> However I've got no speed improvement.
>> I need to keep two weeks worth of partitions at a time, that's why all the WHEN statements.
>> Wish postgres could automate the partition process natively like the other sql db.
>>
>> Thank you guys for your help.
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance on Bulk Insert to Partitioned Table

От

Charles Gomes

Дата:

24 декабря 2012 г., 21:36:22

I've just found this:
From:
http://archives.postgresql.org/pgsql-hackers/2008-12/msg01221.php

"initial tests to insert 140k rows are as follows:

- direct inserts in a child table: 2 seconds

- pgplsql trigger (IF ... ELSE IF ... blocks) : 14.5 seconds.

- C trigger: 4 seconds (actually the overhead is in the constraint check)

"

This is from 2008 and looks like at that time those folks where already having performance issues with partitions.

Going to copy some folks from the old thread, hopefully 4 years later they may have found a solution. Maybe they've
movedon into something more exciting, maybe He is in another world where we don't have database servers. In special the
braveEmmanuel for posting his trigger code that I will hack into my own :P 
Thanks Emmanuel.






----------------------------------------
> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> From: itparanoia@gmail.com
> Date: Mon, 24 Dec 2012 21:11:07 +0400
> CC: jeff.janes@gmail.com; ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org
> To: charlesrg@outlook.com
>
>
> On Dec 24, 2012, at 9:07 PM, Charles Gomes <charlesrg@outlook.com> wrote:
>
> > By the way, I've just re-wrote the code to target the partitions individually and I've got almost 4 times
improvement.
> > Shouldn't it be faster to process the trigger, I would understand if there was no CPU left, but there is lots of
cputo chew. 
>
> I saw your 20% idle cpu and raise eyebrows.
>
> > It seems that there will be no other way to speedup unless the insert code is partition aware.
> >
> > ----------------------------------------
> >> From: charlesrg@outlook.com
> >> To: jeff.janes@gmail.com
> >> CC: ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org
> >> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> >> Date: Mon, 24 Dec 2012 10:51:12 -0500
> >>
> >> ________________________________
> >>> Date: Sun, 23 Dec 2012 14:55:16 -0800
> >>> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> >>> From: jeff.janes@gmail.com
> >>> To: charlesrg@outlook.com
> >>> CC: ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org
> >>>
> >>> On Thursday, December 20, 2012, Charles Gomes wrote:
> >>> True, that's the same I feel, I will be looking to translate the
> >>> trigger to C if I can find good examples, that should accelerate.
> >>>
> >>> I think your performance bottleneck is almost certainly the dynamic
> >>> SQL. Using C to generate that dynamic SQL isn't going to help much,
> >>> because it is still the SQL engine that has to parse, plan, and execute
> >>> it.
> >>>
> >>> Are the vast majority if your inserts done on any given day for records
> >>> from that same day or the one before; or are they evenly spread over
> >>> the preceding year? If the former, you could use static SQL in IF and
> >>> ELSIF for those days, and fall back on the dynamic SQL for the
> >>> exceptions in the ELSE block. Of course that means you have to update
> >>> the trigger every day.
> >>>
> >>>
> >>> Using rules would be totally bad as I'm partitioning daily and after
> >>> one year having 365 lines of IF won't be fun to maintain.
> >>>
> >>> Maintaining 365 lines of IF is what Perl was invented for. That goes
> >>> for triggers w/ static SQL as well as for rules.
> >>>
> >>> If you do the static SQL in a trigger and the dates of the records are
> >>> evenly scattered over the preceding year, make sure your IFs are nested
> >>> like a binary search, not a linear search. And if they are mostly for
> >>> "today's" date, then make sure you search backwards.
> >>>
> >>> Cheers,
> >>>
> >>> Jeff
> >>
> >> Jeff, I've changed the code from dynamic to:
> >>
> >> CREATE OR REPLACE FUNCTION quotes_insert_trigger()
> >> RETURNS trigger AS $$
> >> DECLARE
> >> r_date text;
> >> BEGIN
> >> r_date = to_char(new.received_time, 'YYYY_MM_DD');
> >> case r_date
> >> when '2012_09_10' then
> >> insert into quotes_2012_09_10 values (NEW.*) using new;
> >> return;
> >> when '2012_09_11' then
> >> insert into quotes_2012_09_11 values (NEW.*) using new;
> >> return;
> >> when '2012_09_12' then
> >> insert into quotes_2012_09_12 values (NEW.*) using new;
> >> return;
> >> when '2012_09_13' then
> >> insert into quotes_2012_09_13 values (NEW.*) using new;
> >> return;
> >> when '2012_09_14' then
> >> insert into quotes_2012_09_14 values (NEW.*) using new;
> >> return;
> >> when '2012_09_15' then
> >> insert into quotes_2012_09_15 values (NEW.*) using new;
> >> return;
> >> when '2012_09_16' then
> >> insert into quotes_2012_09_16 values (NEW.*) using new;
> >> return;
> >> when '2012_09_17' then
> >> insert into quotes_2012_09_17 values (NEW.*) using new;
> >> return;
> >> when '2012_09_18' then
> >> insert into quotes_2012_09_18 values (NEW.*) using new;
> >> return;
> >> when '2012_09_19' then
> >> insert into quotes_2012_09_19 values (NEW.*) using new;
> >> return;
> >> when '2012_09_20' then
> >> insert into quotes_2012_09_20 values (NEW.*) using new;
> >> return;
> >> when '2012_09_21' then
> >> insert into quotes_2012_09_21 values (NEW.*) using new;
> >> return;
> >> when '2012_09_22' then
> >> insert into quotes_2012_09_22 values (NEW.*) using new;
> >> return;
> >> when '2012_09_23' then
> >> insert into quotes_2012_09_23 values (NEW.*) using new;
> >> return;
> >> when '2012_09_24' then
> >> insert into quotes_2012_09_24 values (NEW.*) using new;
> >> return;
> >> end case
> >> RETURN NULL;
> >> END;
> >> $$
> >> LANGUAGE plpgsql;
> >>
> >>
> >> However I've got no speed improvement.
> >> I need to keep two weeks worth of partitions at a time, that's why all the WHEN statements.
> >> Wish postgres could automate the partition process natively like the other sql db.
> >>
> >> Thank you guys for your help.
> >>
> >> --
> >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-performance
> >
> > --
> > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-performance
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance on Bulk Insert to Partitioned Table

От

Charles Gomes

Дата:

24 декабря 2012 г., 22:29:46

Emmanuel, I really appreciate you getting back on this old old topic.
Wish you a very very happy Christmas and happy new year.

I'm
 kinda disappointed to see that since 2008 pgsql has not evolved to support native
partitioning. Partitioning with Triggers is so slow.
Looks like pgsql lost some momentum after departure of contributors with initiative like you.

The code I've copied from your post and I'm modifying it for 9.2 and will post it back here.

Thank you very much,
Charles

----------------------------------------
> Date: Mon, 24 Dec 2012 13:47:12 -0500
> From: cecchet@gmail.com
> To: charlesrg@outlook.com
> CC: itparanoia@gmail.com; jeff.janes@gmail.com; ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org;
manu@frogthinker.org;robertmhaas@gmail.com; stark@enterprisedb.com 
> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
>
> Hi Charles,
>
> I am not working on Postgres anymore and none of our patches were ever
> accepted by the community.
> The list of development I made can still be found at
> http://wiki.postgresql.org/wiki/Aster%27s_Development_Projects
>
> All the code related to these improvements must still be accessible in
> the archive. If you can't find something, let me know, I'll try to find
> it in my backups!
>
> Happy holidays
> Emmanuel
>
>
> On 12/24/2012 13:36, Charles Gomes wrote:
> > I've just found this:
> > From:
> > http://archives.postgresql.org/pgsql-hackers/2008-12/msg01221.php
> >
> > "initial tests to insert 140k rows are as follows:
> >
> > - direct inserts in a child table: 2 seconds
> >
> > - pgplsql trigger (IF ... ELSE IF ... blocks) : 14.5 seconds.
> >
> > - C trigger: 4 seconds (actually the overhead is in the constraint check)
> >
> > "
> >
> > This is from 2008 and looks like at that time those folks where already having performance issues with partitions.
> >
> > Going to copy some folks from the old thread, hopefully 4 years later they may have found a solution. Maybe they've
movedon into something more exciting, maybe He is in another world where we don't have database servers. In special the
braveEmmanuel for posting his trigger code that I will hack into my own :P 
> > Thanks Emmanuel.
> >
> >
> >
> >
> >
> >
> > ----------------------------------------
> >> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> >> From: itparanoia@gmail.com
> >> Date: Mon, 24 Dec 2012 21:11:07 +0400
> >> CC: jeff.janes@gmail.com; ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org
> >> To: charlesrg@outlook.com
> >>
> >>
> >> On Dec 24, 2012, at 9:07 PM, Charles Gomes <charlesrg@outlook.com> wrote:
> >>
> >>> By the way, I've just re-wrote the code to target the partitions individually and I've got almost 4 times
improvement.
> >>> Shouldn't it be faster to process the trigger, I would understand if there was no CPU left, but there is lots of
cputo chew. 
> >> I saw your 20% idle cpu and raise eyebrows.
> >>
> >>> It seems that there will be no other way to speedup unless the insert code is partition aware.
> >>>
> >>> ----------------------------------------
> >>>> From: charlesrg@outlook.com
> >>>> To: jeff.janes@gmail.com
> >>>> CC: ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org
> >>>> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> >>>> Date: Mon, 24 Dec 2012 10:51:12 -0500
> >>>>
> >>>> ________________________________
> >>>>> Date: Sun, 23 Dec 2012 14:55:16 -0800
> >>>>> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> >>>>> From: jeff.janes@gmail.com
> >>>>> To: charlesrg@outlook.com
> >>>>> CC: ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org
> >>>>>
> >>>>> On Thursday, December 20, 2012, Charles Gomes wrote:
> >>>>> True, that's the same I feel, I will be looking to translate the
> >>>>> trigger to C if I can find good examples, that should accelerate.
> >>>>>
> >>>>> I think your performance bottleneck is almost certainly the dynamic
> >>>>> SQL. Using C to generate that dynamic SQL isn't going to help much,
> >>>>> because it is still the SQL engine that has to parse, plan, and execute
> >>>>> it.
> >>>>>
> >>>>> Are the vast majority if your inserts done on any given day for records
> >>>>> from that same day or the one before; or are they evenly spread over
> >>>>> the preceding year? If the former, you could use static SQL in IF and
> >>>>> ELSIF for those days, and fall back on the dynamic SQL for the
> >>>>> exceptions in the ELSE block. Of course that means you have to update
> >>>>> the trigger every day.
> >>>>>
> >>>>>
> >>>>> Using rules would be totally bad as I'm partitioning daily and after
> >>>>> one year having 365 lines of IF won't be fun to maintain.
> >>>>>
> >>>>> Maintaining 365 lines of IF is what Perl was invented for. That goes
> >>>>> for triggers w/ static SQL as well as for rules.
> >>>>>
> >>>>> If you do the static SQL in a trigger and the dates of the records are
> >>>>> evenly scattered over the preceding year, make sure your IFs are nested
> >>>>> like a binary search, not a linear search. And if they are mostly for
> >>>>> "today's" date, then make sure you search backwards.
> >>>>>
> >>>>> Cheers,
> >>>>>
> >>>>> Jeff
> >>>> Jeff, I've changed the code from dynamic to:
> >>>>
> >>>> CREATE OR REPLACE FUNCTION quotes_insert_trigger()
> >>>> RETURNS trigger AS $$
> >>>> DECLARE
> >>>> r_date text;
> >>>> BEGIN
> >>>> r_date = to_char(new.received_time, 'YYYY_MM_DD');
> >>>> case r_date
> >>>> when '2012_09_10' then
> >>>> insert into quotes_2012_09_10 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_11' then
> >>>> insert into quotes_2012_09_11 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_12' then
> >>>> insert into quotes_2012_09_12 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_13' then
> >>>> insert into quotes_2012_09_13 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_14' then
> >>>> insert into quotes_2012_09_14 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_15' then
> >>>> insert into quotes_2012_09_15 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_16' then
> >>>> insert into quotes_2012_09_16 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_17' then
> >>>> insert into quotes_2012_09_17 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_18' then
> >>>> insert into quotes_2012_09_18 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_19' then
> >>>> insert into quotes_2012_09_19 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_20' then
> >>>> insert into quotes_2012_09_20 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_21' then
> >>>> insert into quotes_2012_09_21 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_22' then
> >>>> insert into quotes_2012_09_22 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_23' then
> >>>> insert into quotes_2012_09_23 values (NEW.*) using new;
> >>>> return;
> >>>> when '2012_09_24' then
> >>>> insert into quotes_2012_09_24 values (NEW.*) using new;
> >>>> return;
> >>>> end case
> >>>> RETURN NULL;
> >>>> END;
> >>>> $$
> >>>> LANGUAGE plpgsql;
> >>>>
> >>>>
> >>>> However I've got no speed improvement.
> >>>> I need to keep two weeks worth of partitions at a time, that's why all the WHEN statements.
> >>>> Wish postgres could automate the partition process natively like the other sql db.
> >>>>
> >>>> Thank you guys for your help.
> >>>>
> >>>> --
> >>>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> >>>> To make changes to your subscription:
> >>>> http://www.postgresql.org/mailpref/pgsql-performance
> >>> --
> >>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> >>> To make changes to your subscription:
> >>> http://www.postgresql.org/mailpref/pgsql-performance
> >>
> >>
> >> --
> >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Performance on Bulk Insert to Partitioned Table

От

Jeff Janes

Дата:

27 декабря 2012 г., 07:03:37

On Monday, December 24, 2012, Charles Gomes wrote:

________________________________

>
> I think your performance bottleneck is almost certainly the dynamic
> SQL. Using C to generate that dynamic SQL isn't going to help much,
> because it is still the SQL engine that has to parse, plan, and execute
> it.
>

Jeff, I've changed the code from dynamic to:

CREATE OR REPLACE FUNCTION quotes_insert_trigger()
RETURNS trigger AS $$
DECLARE
r_date text;
BEGIN
r_date = to_char(new.received_time, 'YYYY_MM_DD');
case r_date
    when '2012_09_10' then
        insert into quotes_2012_09_10 values (NEW.*) using new;
        return;
...

However I've got no speed improvement.
I need to keep two weeks worth of partitions at a time, that's why all the WHEN statements.

The 'using new' and return without argument are syntax errors.

When I do a model system with those fixed, I get about 2 fold improvement over the dynamic SQL performance. Even if your performance did not go up, did your CPU usage go down? Perhaps you have multiple bottlenecks all sitting at about the same place, and so tackling any one of them at a time doesn't get you anywhere.

How does both the dynamic and the CASE scale with the number of threads? I think you said you had something like 70 sessions, but only 8 CPUs. That probably will do bad things with contention, and I don't see how using more connections than CPUs is going to help you here. If the CASE starts out faster in single thread but then flat lines and the EXECUTE catches up, that suggests a different avenue of investigation than they are always the same.

Wish postgres could automate the partition process natively like the other sql db.

More automated would be nice (i.e. one operation to make both the check constraints and the trigger, so they can't get out of sync), but would not necessarily mean faster. I don't know what you mean about other db. Last time I looked at partitioning in mysql, it was only about breaking up the underlying storage into separate files (without regards to contents of the rows), so that is the same as what postgres does automatically. And in Oracle, their partitioning seemed about the same as postgres's as far as administrative tedium was concerned. I'm not familiar with how the MS product handles it, and maybe me experience with the other two are out of date.

Cheers,

Jeff

Re: Performance on Bulk Insert to Partitioned Table

От

Pavel Stehule

Дата:

27 декабря 2012 г., 08:41:08

2012/12/27 Jeff Janes <jeff.janes@gmail.com>:
> On Monday, December 24, 2012, Charles Gomes wrote:
>>
>> ________________________________
>
>
>>
>> >
>> > I think your performance bottleneck is almost certainly the dynamic
>> > SQL.  Using C to generate that dynamic SQL isn't going to help much,
>> > because it is still the SQL engine that has to parse, plan, and execute
>> > it.
>> >
>>
>> Jeff, I've changed the code from dynamic to:
>>
>> CREATE OR REPLACE FUNCTION quotes_insert_trigger()
>> RETURNS trigger AS $$
>> DECLARE
>> r_date text;
>> BEGIN
>> r_date = to_char(new.received_time, 'YYYY_MM_DD');
>> case r_date
>>     when '2012_09_10' then
>>         insert into quotes_2012_09_10 values (NEW.*) using new;
>>         return;
>> ...
>
>
>>
>> However I've got no speed improvement.
>> I need to keep two weeks worth of partitions at a time, that's why all the
>> WHEN statements.
>
>
> The 'using new' and return without argument are syntax errors.
>
> When I do a model system with those fixed, I get about 2 fold improvement
> over the dynamic SQL performance.  Even if your performance did not go up,
> did your CPU usage go down?  Perhaps you have multiple bottlenecks all
> sitting at about the same place, and so tackling any one of them at a time
> doesn't get you anywhere.
>
> How does both the dynamic and the CASE scale with the number of threads?  I
> think you said you had something like 70 sessions, but only 8 CPUs.  That
> probably will do bad things with contention, and I don't see how using more
> connections than CPUs is going to help you here.  If the CASE starts out
> faster in single thread but then flat lines and the EXECUTE catches up, that
> suggests a different avenue of investigation than they are always the same.
>
>
>>
>> Wish postgres could automate the partition process natively like the other
>> sql db.
>
>
> More automated would be nice (i.e. one operation to make both the check
> constraints and the trigger, so they can't get out of sync), but would not
> necessarily mean faster.  I don't know what you mean about other db.  Last
> time I looked at partitioning in mysql, it was only about breaking up the
> underlying storage into separate files (without regards to contents of the
> rows), so that is the same as what postgres does automatically.  And in
> Oracle, their partitioning seemed about the same as postgres's as far as
> administrative tedium was concerned.  I'm not familiar with how the MS
> product handles it, and maybe me experience with the other two are out of
> date.

I did simple test - not too precious (just for first orientation) -
tested on 9.3 - compiled without assertions

insert 0.5M rows into empty target table with one trivial trigger and
one index is about 4 sec

same with little bit complex trigger - one IF statement and two assign
statements is about 5 sec

simple forwarding two two tables - 8 sec

using dynamic SQL is significantly slower - 18 sec - probably due
overhead with cached plans

a overhead depends on number of partitions, number of indexes, but I
expect so overhead of redistributed triggers should be about 50-100%
(less on large tables, higher on small tables).

Native implementation should significantly effective evaluate
expressions, mainly simple expressions - (this is significant for
large number of partitions) and probably can do tuple forwarding
faster than is heavy INSERT statement (is question if is possible
decrease some overhead with more sophisticate syntax (by removing
record expand).

So native implementation can carry significant speed up - mainly if we
can distribute tuples without expression evaluating (evaluated by
executor)

Regards

Pavel




>
> Cheers,
>
> Jeff

Re: Performance on Bulk Insert to Partitioned Table

От

Charles Gomes

Дата:

27 декабря 2012 г., 19:16:35

________________________________
> Date: Wed, 26 Dec 2012 23:03:33 -0500
> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> From: jeff.janes@gmail.com
> To: charlesrg@outlook.com
> CC: ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org
>
> On Monday, December 24, 2012, Charles Gomes wrote:
> ________________________________
>
>  >
>  > I think your performance bottleneck is almost certainly the dynamic
>  > SQL.  Using C to generate that dynamic SQL isn't going to help much,
>  > because it is still the SQL engine that has to parse, plan, and execute
>  > it.
>  >
>
> Jeff, I've changed the code from dynamic to:
>
> CREATE OR REPLACE FUNCTION quotes_insert_trigger()
> RETURNS trigger AS $$
> DECLARE
> r_date text;
> BEGIN
> r_date = to_char(new.received_time, 'YYYY_MM_DD');
> case r_date
>      when '2012_09_10' then
>          insert into quotes_2012_09_10 values (NEW.*) using new;
>          return;
> ...
>
>
> However I've got no speed improvement.
> I need to keep two weeks worth of partitions at a time, that's why all
> the WHEN statements.
>
> The 'using new' and return without argument are syntax errors.
>
> When I do a model system with those fixed, I get about 2 fold
> improvement over the dynamic SQL performance.  Even if your performance
> did not go up, did your CPU usage go down?  Perhaps you have multiple
> bottlenecks all sitting at about the same place, and so tackling any
> one of them at a time doesn't get you anywhere.



I’ve run a small test with the fixes you mentioned and it changed from
1H:20M to, 1H:30M to insert 396000000 rows.


 

If there was another bottleneck, performance when targeting the
partitions directly would not be twice as fast. I’ve run another long insert
test and it takes 4H:15M to complete using triggers to distribute the inserts.  When targeting It completes in 1H:55M.
That’s
both for 70 simultaneous workers with the same data and 1188000000 rows.


 

The tests that Emmanuel did translating the trigger to C have great
performance improvement. While His code is very general and could work for
anyone using CHECK’s for triggers. I’m still working on fixing it so it’s
compatible with 9.2

So far I’m having a hard time using the C triggers anyway,:

ERROR:  could not load library
"/var/lib/pgsql/pg_trigger_example.so":
/var/lib/pgsql/pg_trigger_example.so: failed to map segment from shared object:
Operation not permitted

I will do more reading on it.

I think having it to work again can bring some value so more people can
be aware of the performance improvement using C instead of PLSQL.


>
> How does both the dynamic and the CASE scale with the number of
> threads?  I think you said you had something like 70 sessions, but only
> 8 CPUs.  That probably will do bad things with contention, and I don't
> see how using more connections than CPUs is going to help you here.  If
> the CASE starts out faster in single thread but then flat lines and the
> EXECUTE catches up, that suggests a different avenue of investigation
> than they are always the same.
>


I didn’t see a significant change in CPU utilization, it seems to be a
bit less, but not that much, however IO is still idling.


>
> Wish postgres could automate the partition process natively like the
> other sql db.
>
> More automated would be nice (i.e. one operation to make both the check
> constraints and the trigger, so they can't get out of sync), but would
> not necessarily mean faster.  I don't know what you mean about other
> db.  Last time I looked at partitioning in mysql, it was only about
> breaking up the underlying storage into separate files (without regards
> to contents of the rows), so that is the same as what postgres does
> automatically.  And in Oracle, their partitioning seemed about the same
> as postgres's as far as administrative tedium was concerned.  I'm not
> familiar with how the MS product handles it, and maybe me experience
> with the other two are out of date.



The other free sql DB supports a more elaborated scheme, for example:

CREATE
TABLE ti (id INT, amount DECIMAL(7,2), tr_date DATE)

    ENGINE=INNODB

    PARTITION BY HASH( MONTH(tr_date) )

    PARTITIONS 6;

It
also supports partitioning by RANGE, LIST or KEY.


The paid one uses a very similar style:CREATE TABLE dept (deptno NUMBER, deptname VARCHAR(32))
     PARTITION BY HASH(deptno) PARTITIONS 16;

Also:
CREATE TABLE sales
  ( prod_id       NUMBER(6)
  , cust_id       NUMBER
  , time_id       DATE
  , quantity_sold NUMBER(3)
  )
 PARTITION BY RANGE (time_id)
 ( PARTITION sales_q1_2006 VALUES LESS THAN (TO_DATE('01-APR-2006','dd-MON-yyyy'))
    TABLESPACE tsa
 , PARTITION sales_q2_2006 VALUES LESS THAN (TO_DATE('01-JUL-2006','dd-MON-yyyy'))
    TABLESPACE tsb
...




>
> Cheers,
>
> Jeff

Performance on Bulk Insert to Partitioned Table

От

Jeff Janes

Дата:

27 декабря 2012 г., 21:11:54

On Wednesday, December 26, 2012, Pavel Stehule wrote:

2012/12/27 Jeff Janes <jeff.janes@gmail.com>:
>
> More automated would be nice (i.e. one operation to make both the check
> constraints and the trigger, so they can't get out of sync), but would not
> necessarily mean faster.

Native implementation should significantly effective evaluate

expressions, mainly simple expressions - (this is significant for
large number of partitions) and probably can do tuple forwarding
faster than is heavy INSERT statement (is question if is possible
decrease some overhead with more sophisticate syntax (by removing
record expand).

If the main goal is to make it faster, I'd rather see all of plpgsql get faster, rather than just a special case of partitioning triggers. For example, right now a CASE <expression> statement with 100 branches is about the same speed as an equivalent list of 100 elsif. So it seems to be doing a linear search, when it could be doing a hash that should be a lot faster.

So native implementation can carry significant speed up - mainly if we
can distribute tuples without expression evaluating (evaluated by
executor)

Making partitioning inserts native does open up other opportunities to make it faster, and also to make it administratively easier; but do we want to try to tackle both of those goals simultaneously? I think the administrative aspects would come first. (But I doubt I will be the one to implement either, so my vote doesn't count for much here.)

Cheers,

Jeff

Re: Performance on Bulk Insert to Partitioned Table

От

Jeff Janes

Дата:

27 декабря 2012 г., 21:24:19

On Monday, December 24, 2012, Charles Gomes wrote:

By the way, I've just re-wrote the code to target the partitions individually and I've got almost 4 times improvement.
Shouldn't it be faster to process the trigger, I would understand if there was no CPU left, but there is lots of cpu to chew.

Once you turned off hyperthreading, it was reporting 75% CPU usage. Assuming that that accounting is perfect, that means you could only get 33% faster if you were to somehow start using all of the CPU. So I don't think I'd call that a lot of CPU left. And if you have 70 processes fighting for 8 cores, I'm not surprised you can't get above that CPU usage.

It seems that there will be no other way to speedup unless the insert code is partition aware.

There may be other ways, but that one will probably get you the most gain, especially if you use COPY or \copy. Since the main goal of partitioning is to allow your physical storage layout to conspire with your bulk operations, it is hard to see how you can get the benefits of partitioning without having your bulk loading participate in that conspiracy.

Cheers,

Jeff

Re: Performance on Bulk Insert to Partitioned Table

От

Pavel Stehule

Дата:

27 декабря 2012 г., 21:46:56

2012/12/27 Jeff Janes <jeff.janes@gmail.com>:
> On Wednesday, December 26, 2012, Pavel Stehule wrote:
>>
>> 2012/12/27 Jeff Janes <jeff.janes@gmail.com>:
>> >
>> > More automated would be nice (i.e. one operation to make both the check
>> > constraints and the trigger, so they can't get out of sync), but would
>> > not
>> > necessarily mean faster.
>>
>
> <snip some benchmarking>
>
>> Native implementation should significantly effective evaluate
>>
>> expressions, mainly simple expressions - (this is significant for
>> large number of partitions) and probably can do tuple forwarding
>> faster than is heavy INSERT statement (is question if is possible
>> decrease some overhead with more sophisticate syntax (by removing
>> record expand).
>
>
> If the main goal is to make it faster, I'd rather see all of plpgsql get
> faster, rather than just a special case of partitioning triggers.  For
> example, right now a CASE <expression> statement with 100 branches is about
> the same speed as an equivalent list of 100 elsif.  So it seems to be doing
> a linear search, when it could be doing a hash that should be a lot faster.

a bottleneck is not in PL/pgSQL directly. It is in PostgreSQL
expression executor. Personally I don't see any simple optimization -
maybe some variant of JIT (for expression executor) should to improve
performance.

Any other optimization require significant redesign PL/pgSQL what is
job what I don't would do now - personally, it is not work what I
would to start by self, because using plpgsql triggers for
partitioning is bad usage of plpgsql - and I believe so after native
implementation any this work will be useless. Design some generic C
trigger or really full implementation is better work.

More, there is still expensive INSERT statement - forwarding tuple on
C level should be significantly faster - because it don't be generic.

>
>
>>
>>
>> So native implementation can carry significant speed up - mainly if we
>> can distribute tuples without expression evaluating (evaluated by
>> executor)
>
>
> Making partitioning inserts native does open up other opportunities to make
> it faster, and also to make it administratively easier; but do we want to
> try to tackle both of those goals simultaneously?  I think the
> administrative aspects would come first.  (But I doubt I will be the one to
> implement either, so my vote doesn't count for much here.)

Anybody who starts work on native implementation will have my support
(it is feature that lot of customers needs). I have customers that can
support development and I believe so there are others. Actually It
needs only one tenacious man, because it is work for two years.

Regards

Pavel

>
>
> Cheers,
>
> Jeff
>>
>>
>

Re: Performance on Bulk Insert to Partitioned Table

От

Charles Gomes

Дата:

27 декабря 2012 г., 22:00:10

Pavel,

I've been trying to port the work of Emmanuel
http://archives.postgresql.org/pgsql-hackers/2008-12/msg01221.php


His implementation is pretty straight forward. Simple trigger doing constrain checks with caching for bulk inserts.
So far that's what I got http://www.widesol.com/~charles/pgsql/partition.c
I had some issues as He uses HeapTuples and on 9.2 I see a Slot.


----------------------------------------
> From: pavel.stehule@gmail.com
> Date: Thu, 27 Dec 2012 19:46:12 +0100
> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> To: jeff.janes@gmail.com
> CC: charlesrg@outlook.com; ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org
>
> 2012/12/27 Jeff Janes <jeff.janes@gmail.com>:
> > On Wednesday, December 26, 2012, Pavel Stehule wrote:
> >>
> >> 2012/12/27 Jeff Janes <jeff.janes@gmail.com>:
> >> >
> >> > More automated would be nice (i.e. one operation to make both the check
> >> > constraints and the trigger, so they can't get out of sync), but would
> >> > not
> >> > necessarily mean faster.
> >>
> >
> > <snip some benchmarking>
> >
> >> Native implementation should significantly effective evaluate
> >>
> >> expressions, mainly simple expressions - (this is significant for
> >> large number of partitions) and probably can do tuple forwarding
> >> faster than is heavy INSERT statement (is question if is possible
> >> decrease some overhead with more sophisticate syntax (by removing
> >> record expand).
> >
> >
> > If the main goal is to make it faster, I'd rather see all of plpgsql get
> > faster, rather than just a special case of partitioning triggers. For
> > example, right now a CASE <expression> statement with 100 branches is about
> > the same speed as an equivalent list of 100 elsif. So it seems to be doing
> > a linear search, when it could be doing a hash that should be a lot faster.
>
> a bottleneck is not in PL/pgSQL directly. It is in PostgreSQL
> expression executor. Personally I don't see any simple optimization -
> maybe some variant of JIT (for expression executor) should to improve
> performance.
>
> Any other optimization require significant redesign PL/pgSQL what is
> job what I don't would do now - personally, it is not work what I
> would to start by self, because using plpgsql triggers for
> partitioning is bad usage of plpgsql - and I believe so after native
> implementation any this work will be useless. Design some generic C
> trigger or really full implementation is better work.
>
> More, there is still expensive INSERT statement - forwarding tuple on
> C level should be significantly faster - because it don't be generic.
>
> >
> >
> >>
> >>
> >> So native implementation can carry significant speed up - mainly if we
> >> can distribute tuples without expression evaluating (evaluated by
> >> executor)
> >
> >
> > Making partitioning inserts native does open up other opportunities to make
> > it faster, and also to make it administratively easier; but do we want to
> > try to tackle both of those goals simultaneously? I think the
> > administrative aspects would come first. (But I doubt I will be the one to
> > implement either, so my vote doesn't count for much here.)
>
> Anybody who starts work on native implementation will have my support
> (it is feature that lot of customers needs). I have customers that can
> support development and I believe so there are others. Actually It
> needs only one tenacious man, because it is work for two years.
>
> Regards
>
> Pavel
>
> >
> >
> > Cheers,
> >
> > Jeff
> >>
> >>
> >
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance on Bulk Insert to Partitioned Table

От

Stephen Frost

Дата:

27 декабря 2012 г., 22:00:25

* Jeff Janes (jeff.janes@gmail.com) wrote:
> If the main goal is to make it faster, I'd rather see all of plpgsql get
> faster, rather than just a special case of partitioning triggers.  For
> example, right now a CASE <expression> statement with 100 branches is about
> the same speed as an equivalent list of 100 elsif.  So it seems to be doing
> a linear search, when it could be doing a hash that should be a lot faster.

That's a nice thought, but I'm not sure that it'd really be practical.
CASE statements in plpgsql are completely general and really behave more
like an if/elsif tree than a C-style switch() statement or similar.  For
one thing, the expression need not use the same variables, could be
complex multi-variable conditionals, etc.

Figuring out that you could build a dispatch table for a given CASE
statement and then building it, storing it, and remembering to use it,
wouldn't be cheap.

On the other hand, I've actually *wanted* a simpler syntax on occation.
I have no idea if there'd be a way to make it work, but this would be
kind of nice:

CASE OF x -- or whatever
  WHEN 1 THEN blah blah
  WHEN 2 THEN blah blah
  WHEN 3 THEN blah blah
END

which would be possible to build into a dispatch table by looking at the
type of x and the literals used in the overall CASE statement.  Even so,
there would likely be some number of WHEN conditions required before
it'd actually be more efficient to use, though perhaps getting rid of
the expression evaluation (if that'd be possible) would make up for it.

    Thanks,

        Stephen

Вложения

signature.asc

Re: Performance on Bulk Insert to Partitioned Table

От

Pavel Stehule

Дата:

27 декабря 2012 г., 22:22:09

2012/12/27 Stephen Frost <sfrost@snowman.net>:
> * Jeff Janes (jeff.janes@gmail.com) wrote:
>> If the main goal is to make it faster, I'd rather see all of plpgsql get
>> faster, rather than just a special case of partitioning triggers.  For
>> example, right now a CASE <expression> statement with 100 branches is about
>> the same speed as an equivalent list of 100 elsif.  So it seems to be doing
>> a linear search, when it could be doing a hash that should be a lot faster.
>
> That's a nice thought, but I'm not sure that it'd really be practical.
> CASE statements in plpgsql are completely general and really behave more
> like an if/elsif tree than a C-style switch() statement or similar.  For
> one thing, the expression need not use the same variables, could be
> complex multi-variable conditionals, etc.
>
> Figuring out that you could build a dispatch table for a given CASE
> statement and then building it, storing it, and remembering to use it,
> wouldn't be cheap.
>
> On the other hand, I've actually *wanted* a simpler syntax on occation.
> I have no idea if there'd be a way to make it work, but this would be
> kind of nice:
>
> CASE OF x -- or whatever
>   WHEN 1 THEN blah blah
>   WHEN 2 THEN blah blah
>   WHEN 3 THEN blah blah
> END
>
> which would be possible to build into a dispatch table by looking at the
> type of x and the literals used in the overall CASE statement.  Even so,
> there would likely be some number of WHEN conditions required before
> it'd actually be more efficient to use, though perhaps getting rid of
> the expression evaluation (if that'd be possible) would make up for it.

I understand, but I am not happy with it. CASE is relative complex.
There is SQL CASE too, and this is third variant of CASE.  Maybe some
simple CASE statements can be supported by parser and there should be
local optimization (probably only for numeric - without casting) But
it needs relative lot of  new code? Will be this code accepted?

Regards

Pavel

>
>         Thanks,
>
>                 Stephen

Re: Performance on Bulk Insert to Partitioned Table

От

Emmanuel Cecchet

Дата:

27 декабря 2012 г., 22:45:38

Hi Charles,

I am not working on Postgres anymore and none of our patches were ever
accepted by the community.
The list of development I made can still be found at
http://wiki.postgresql.org/wiki/Aster%27s_Development_Projects

All the code related to these improvements must still be accessible in
the archive. If you can't find something, let me know, I'll try to find
it in my backups!

Happy holidays
Emmanuel


On 12/24/2012 13:36, Charles Gomes wrote:
> I've just found this:
> From:
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg01221.php
>
> "initial tests to insert 140k rows are as follows:
>
> - direct inserts in a child table: 2 seconds
>
> - pgplsql trigger (IF ... ELSE IF ... blocks) : 14.5 seconds.
>
> - C trigger: 4 seconds (actually the overhead is in the constraint check)
>
> "
>
> This is from 2008 and looks like at that time those folks where already having performance issues with partitions.
>
> Going to copy some folks from the old thread, hopefully 4 years later they may have found a solution. Maybe they've
movedon into something more exciting, maybe He is in another world where we don't have database servers. In special the
braveEmmanuel for posting his trigger code that I will hack into my own :P 
> Thanks Emmanuel.
>
>
>
>
>
>
> ----------------------------------------
>> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
>> From: itparanoia@gmail.com
>> Date: Mon, 24 Dec 2012 21:11:07 +0400
>> CC: jeff.janes@gmail.com; ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org
>> To: charlesrg@outlook.com
>>
>>
>> On Dec 24, 2012, at 9:07 PM, Charles Gomes <charlesrg@outlook.com> wrote:
>>
>>> By the way, I've just re-wrote the code to target the partitions individually and I've got almost 4 times
improvement.
>>> Shouldn't it be faster to process the trigger, I would understand if there was no CPU left, but there is lots of
cputo chew. 
>> I saw your 20% idle cpu and raise eyebrows.
>>
>>> It seems that there will be no other way to speedup unless the insert code is partition aware.
>>>
>>> ----------------------------------------
>>>> From: charlesrg@outlook.com
>>>> To: jeff.janes@gmail.com
>>>> CC: ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org
>>>> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
>>>> Date: Mon, 24 Dec 2012 10:51:12 -0500
>>>>
>>>> ________________________________
>>>>> Date: Sun, 23 Dec 2012 14:55:16 -0800
>>>>> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
>>>>> From: jeff.janes@gmail.com
>>>>> To: charlesrg@outlook.com
>>>>> CC: ondrej.ivanic@gmail.com; pgsql-performance@postgresql.org
>>>>>
>>>>> On Thursday, December 20, 2012, Charles Gomes wrote:
>>>>> True, that's the same I feel, I will be looking to translate the
>>>>> trigger to C if I can find good examples, that should accelerate.
>>>>>
>>>>> I think your performance bottleneck is almost certainly the dynamic
>>>>> SQL. Using C to generate that dynamic SQL isn't going to help much,
>>>>> because it is still the SQL engine that has to parse, plan, and execute
>>>>> it.
>>>>>
>>>>> Are the vast majority if your inserts done on any given day for records
>>>>> from that same day or the one before; or are they evenly spread over
>>>>> the preceding year? If the former, you could use static SQL in IF and
>>>>> ELSIF for those days, and fall back on the dynamic SQL for the
>>>>> exceptions in the ELSE block. Of course that means you have to update
>>>>> the trigger every day.
>>>>>
>>>>>
>>>>> Using rules would be totally bad as I'm partitioning daily and after
>>>>> one year having 365 lines of IF won't be fun to maintain.
>>>>>
>>>>> Maintaining 365 lines of IF is what Perl was invented for. That goes
>>>>> for triggers w/ static SQL as well as for rules.
>>>>>
>>>>> If you do the static SQL in a trigger and the dates of the records are
>>>>> evenly scattered over the preceding year, make sure your IFs are nested
>>>>> like a binary search, not a linear search. And if they are mostly for
>>>>> "today's" date, then make sure you search backwards.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Jeff
>>>> Jeff, I've changed the code from dynamic to:
>>>>
>>>> CREATE OR REPLACE FUNCTION quotes_insert_trigger()
>>>> RETURNS trigger AS $$
>>>> DECLARE
>>>> r_date text;
>>>> BEGIN
>>>> r_date = to_char(new.received_time, 'YYYY_MM_DD');
>>>> case r_date
>>>> when '2012_09_10' then
>>>> insert into quotes_2012_09_10 values (NEW.*) using new;
>>>> return;
>>>> when '2012_09_11' then
>>>> insert into quotes_2012_09_11 values (NEW.*) using new;
>>>> return;
>>>> when '2012_09_12' then
>>>> insert into quotes_2012_09_12 values (NEW.*) using new;
>>>> return;
>>>> when '2012_09_13' then
>>>> insert into quotes_2012_09_13 values (NEW.*) using new;
>>>> return;
>>>> when '2012_09_14' then
>>>> insert into quotes_2012_09_14 values (NEW.*) using new;
>>>> return;
>>>> when '2012_09_15' then
>>>> insert into quotes_2012_09_15 values (NEW.*) using new;
>>>> return;
>>>> when '2012_09_16' then
>>>> insert into quotes_2012_09_16 values (NEW.*) using new;
>>>> return;
>>>> when '2012_09_17' then
>>>> insert into quotes_2012_09_17 values (NEW.*) using new;
>>>> return;
>>>> when '2012_09_18' then
>>>> insert into quotes_2012_09_18 values (NEW.*) using new;
>>>> return;
>>>> when '2012_09_19' then
>>>> insert into quotes_2012_09_19 values (NEW.*) using new;
>>>> return;
>>>> when '2012_09_20' then
>>>> insert into quotes_2012_09_20 values (NEW.*) using new;
>>>> return;
>>>> when '2012_09_21' then
>>>> insert into quotes_2012_09_21 values (NEW.*) using new;
>>>> return;
>>>> when '2012_09_22' then
>>>> insert into quotes_2012_09_22 values (NEW.*) using new;
>>>> return;
>>>> when '2012_09_23' then
>>>> insert into quotes_2012_09_23 values (NEW.*) using new;
>>>> return;
>>>> when '2012_09_24' then
>>>> insert into quotes_2012_09_24 values (NEW.*) using new;
>>>> return;
>>>> end case
>>>> RETURN NULL;
>>>> END;
>>>> $$
>>>> LANGUAGE plpgsql;
>>>>
>>>>
>>>> However I've got no speed improvement.
>>>> I need to keep two weeks worth of partitions at a time, that's why all the WHEN statements.
>>>> Wish postgres could automate the partition process natively like the other sql db.
>>>>
>>>> Thank you guys for your help.
>>>>
>>>> --
>>>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>>>> To make changes to your subscription:
>>>> http://www.postgresql.org/mailpref/pgsql-performance
>>> --
>>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance

Re: Performance on Bulk Insert to Partitioned Table

От

Vitalii Tymchyshyn

Дата:

28 декабря 2012 г., 15:35:49

There is switch-like sql case:

39.6.2.4. Simple `CASE`

CASE search-expression WHEN expression [, expression [ ... ]] THEN statements [ WHEN expression [, expression [ ... ]] THEN statements ... ] [ ELSE statements ] END CASE;

It should work like C switch statement.

Also, for bulk insert, have you tried "for each statement" triggers instead of "for each row"?

This would look like a lot of inserts and would not be fast in single-row-insert case, but can give you benefit for huge inserts.

It should look like

insert into quotes_2012_09_10 select * from new where cast(new.received_time as date) = '2012-09-10' ;

insert into quotes_2012_09_11 select * from new where cast(new.received_time as date) = '2012-09-11' ;

...

2012/12/27 Stephen Frost <sfrost@snowman.net>

* Jeff Janes (jeff.janes@gmail.com) wrote:
> If the main goal is to make it faster, I'd rather see all of plpgsql get
> faster, rather than just a special case of partitioning triggers. For
> example, right now a CASE <expression> statement with 100 branches is about
> the same speed as an equivalent list of 100 elsif. So it seems to be doing
> a linear search, when it could be doing a hash that should be a lot faster.

That's a nice thought, but I'm not sure that it'd really be practical.
CASE statements in plpgsql are completely general and really behave more
like an if/elsif tree than a C-style switch() statement or similar. For
one thing, the expression need not use the same variables, could be
complex multi-variable conditionals, etc.

Figuring out that you could build a dispatch table for a given CASE
statement and then building it, storing it, and remembering to use it,
wouldn't be cheap.

On the other hand, I've actually *wanted* a simpler syntax on occation.
I have no idea if there'd be a way to make it work, but this would be
kind of nice:

CASE OF x -- or whatever
WHEN 1 THEN blah blah
WHEN 2 THEN blah blah
WHEN 3 THEN blah blah
END

which would be possible to build into a dispatch table by looking at the
type of x and the literals used in the overall CASE statement. Even so,
there would likely be some number of WHEN conditions required before
it'd actually be more efficient to use, though perhaps getting rid of
the expression evaluation (if that'd be possible) would make up for it.

Thanks,

Stephen

--
Best regards,
Vitalii Tymchyshyn

Re: Performance on Bulk Insert to Partitioned Table

От

Vitalii Tymchyshyn

Дата:

28 декабря 2012 г., 15:39:31

BTW: If "select count(*) from new" is fast, you can even choose the strategy in trigger depending on insert size.

2012/12/28 Vitalii Tymchyshyn <tivv00@gmail.com>

There is switch-like sql case:
39.6.2.4. Simple CASE
CASE search-expression WHEN expression [, expression [ ... ]] THEN statements [ WHEN expression [, expression [ ... ]] THEN statements ... ] [ ELSE statements ] END CASE;
It should work like C switch statement.

Also, for bulk insert, have you tried "for each statement" triggers instead of "for each row"?
This would look like a lot of inserts and would not be fast in single-row-insert case, but can give you benefit for huge inserts.
It should look like
insert into quotes_2012_09_10 select * from new where cast(new.received_time as date) = '2012-09-10' ;
insert into quotes_2012_09_11 select * from new where cast(new.received_time as date) = '2012-09-11' ;
...

2012/12/27 Stephen Frost <sfrost@snowman.net>
* Jeff Janes (jeff.janes@gmail.com) wrote:
> If the main goal is to make it faster, I'd rather see all of plpgsql get
> faster, rather than just a special case of partitioning triggers. For
> example, right now a CASE <expression> statement with 100 branches is about
> the same speed as an equivalent list of 100 elsif. So it seems to be doing
> a linear search, when it could be doing a hash that should be a lot faster.

That's a nice thought, but I'm not sure that it'd really be practical.
CASE statements in plpgsql are completely general and really behave more
like an if/elsif tree than a C-style switch() statement or similar. For
one thing, the expression need not use the same variables, could be
complex multi-variable conditionals, etc.

Figuring out that you could build a dispatch table for a given CASE
statement and then building it, storing it, and remembering to use it,
wouldn't be cheap.

On the other hand, I've actually *wanted* a simpler syntax on occation.
I have no idea if there'd be a way to make it work, but this would be
kind of nice:

CASE OF x -- or whatever
WHEN 1 THEN blah blah
WHEN 2 THEN blah blah
WHEN 3 THEN blah blah
END

which would be possible to build into a dispatch table by looking at the
type of x and the literals used in the overall CASE statement. Even so,
there would likely be some number of WHEN conditions required before
it'd actually be more efficient to use, though perhaps getting rid of
the expression evaluation (if that'd be possible) would make up for it.

Thanks,

Stephen

--
Best regards,
Vitalii Tymchyshyn

--
Best regards,
Vitalii Tymchyshyn

Re: Performance on Bulk Insert to Partitioned Table

От

Stephen Frost

Дата:

28 декабря 2012 г., 15:41:45

Vitalii,

* Vitalii Tymchyshyn (tivv00@gmail.com) wrote:
> There is switch-like sql case:
[...]
> It should work like C switch statement.

It does and it doesn't.  It behaves generally like a C switch statement,
but is much more flexible and therefore can't be optimized like a C
switch statement can be.

    Thanks,

        Stephen

Вложения

signature.asc

Re: Performance on Bulk Insert to Partitioned Table

От

Pavel Stehule

Дата:

28 декабря 2012 г., 15:49:03

Hello

>
> Also, for bulk insert, have you tried "for each statement" triggers instead
> of "for each row"?
> This would look like a lot of inserts and would not be fast in
> single-row-insert case, but can give you benefit for huge inserts.
> It should look like
> insert into quotes_2012_09_10 select * from new where cast(new.received_time
> as date) = '2012-09-10' ;
> insert into quotes_2012_09_11 select * from new where cast(new.received_time
> as date) = '2012-09-11' ;
> ...

It has only one problem - PostgreSQL has not relations NEW and OLD for
statements triggers.

Regards

Pavel

Re: Performance on Bulk Insert to Partitioned Table

От

Vitalii Tymchyshyn

Дата:

28 декабря 2012 г., 16:18:43

Why so? Basic form "case lvalue when rvalue then out ... end" is much like switch.

The "case when condition then out ... end" is different, more complex beast, but first one is essentially a switch. If it is now trnasformed into

"case when lvalue = rvalue1 then out1 when lvalue=rvalue2 then out2 ... end" then this can be optimized and this would benefit many users, not only ones that use partitioning.

2012/12/28 Stephen Frost <sfrost@snowman.net>

Vitalii,

* Vitalii Tymchyshyn (tivv00@gmail.com) wrote:
> There is switch-like sql case:
[...]
> It should work like C switch statement.

It does and it doesn't. It behaves generally like a C switch statement,
but is much more flexible and therefore can't be optimized like a C
switch statement can be.

Thanks,

Stephen

--
Best regards,
Vitalii Tymchyshyn

Re: Performance on Bulk Insert to Partitioned Table

От

Vitalii Tymchyshyn

Дата:

28 декабря 2012 г., 16:25:46

It's a pity. Why does not it listed in "Compatibility" section of create trigger documentation? I think, this makes "for each statement" triggers not compatible with SQL99.

2012/12/28 Pavel Stehule <pavel.stehule@gmail.com>

Hello

>
> Also, for bulk insert, have you tried "for each statement" triggers instead
> of "for each row"?
> This would look like a lot of inserts and would not be fast in
> single-row-insert case, but can give you benefit for huge inserts.
> It should look like
> insert into quotes_2012_09_10 select * from new where cast(new.received_time
> as date) = '2012-09-10' ;
> insert into quotes_2012_09_11 select * from new where cast(new.received_time
> as date) = '2012-09-11' ;
> ...

It has only one problem - PostgreSQL has not relations NEW and OLD for
statements triggers.

Regards

Pavel

--
Best regards,
Vitalii Tymchyshyn

Re: Performance on Bulk Insert to Partitioned Table

От

Pavel Stehule

Дата:

28 декабря 2012 г., 16:42:07

2012/12/28 Vitalii Tymchyshyn <tivv00@gmail.com>:
> Why so? Basic form "case lvalue when rvalue then out ... end" is much like
> switch.
> The "case when condition then out ... end" is different, more complex beast,
> but first one is essentially a switch. If it is now trnasformed into
> "case when lvalue = rvalue1 then out1 when lvalue=rvalue2 then out2 ... end"
> then this can be optimized and this would benefit many users, not only ones
> that use partitioning.

please, look to plpgsql source code. PL/pgSQL is too simply and has
not own arithmetic unit - all is transformed to SELECTs, has not any
optimization. But is really short and maintainable.

These SELECTs are evaluated only when it is necessary - but it is
evaluated by PostgreSQL expression executor - not by PL/pgSQL directly
- PL/pgSQL cannot process constant by self.

So any enhancing needs PL/pgSQL redesign and I am not sure, so this
use case has accurate benefit, because expression bottleneck is only
one part of partitioning triggers bottleneck. More - if you need
really fast code, you can use own code in C - and it be 10x times
faster than any optimized PL/pgSQL code. And using C triggers in
PostgreSQL is not terrible work.

Using plpgsql row triggers for partitioning is not good idea - it is
just work around from my perspective, and we should to solve source of
problem - missing native support.

Regards

Pavel Stehule

>
>
> 2012/12/28 Stephen Frost <sfrost@snowman.net>
>>
>> Vitalii,
>>
>> * Vitalii Tymchyshyn (tivv00@gmail.com) wrote:
>> > There is switch-like sql case:
>> [...]
>> > It should work like C switch statement.
>>
>> It does and it doesn't.  It behaves generally like a C switch statement,
>> but is much more flexible and therefore can't be optimized like a C
>> switch statement can be.
>>
>>         Thanks,
>>
>>                 Stephen
>
>
>
>
> --
> Best regards,
>  Vitalii Tymchyshyn

Re: Performance on Bulk Insert to Partitioned Table

От

Jeff Janes

Дата:

28 декабря 2012 г., 17:05:21

On Friday, December 28, 2012, Vitalii Tymchyshyn wrote:

There is switch-like sql case:

39.6.2.4. Simple `CASE`

CASE search-expression WHEN expression [, expression [ ... ]] THEN statements [ WHEN expression [, expression [ ... ]] THEN statements ... ] [ ELSE statements ] END CASE;

It should work like C switch statement.

I had thought that too, but the catch is that the target expressions do not need to be constants when the function is created. Indeed, they can even be volatile.

CREATE OR REPLACE FUNCTION foo(x integer)

RETURNS integer AS $$

BEGIN

case x

when 0 then return -5;

when (random()*10)::integer then return 1;

when (random()*10)::integer then return 2;

when (random()*10)::integer then return 3;

when (random()*10)::integer then return 4;

when (random()*10)::integer then return 5;

when (random()*10)::integer then return 6;

when (random()*10)::integer then return 7;

when (random()*10)::integer then return 8;

when (random()*10)::integer then return 9;

when (random()*10)::integer then return 10;

else return -6;

Cheers,

Jeff

Re: Performance on Bulk Insert to Partitioned Table

От

Stephen Frost

Дата:

28 декабря 2012 г., 17:10:35

2012/12/28 Vitalii Tymchyshyn <tivv00@gmail.com>:
> Why so? Basic form "case lvalue when rvalue then out ... end" is much like
> switch.

Sorry, to be honest, I missed that distinction and didn't expect that to
work as-is, yet apparently it does.  Does it currently perform the same
as an if/elsif tree or is it implemented to actually use a table lookup?

* Pavel Stehule (pavel.stehule@gmail.com) wrote:
> please, look to plpgsql source code. PL/pgSQL is too simply and has
> not own arithmetic unit - all is transformed to SELECTs, has not any
> optimization. But is really short and maintainable.

I was thinking we'd actually do this for all CASE statements, those in
plpgsql and those in regular SQL, if it's possible to do.  Hopefully
it'd be possible to do easily in plpgsql once the SQL-level CASE is
done.

> These SELECTs are evaluated only when it is necessary - but it is
> evaluated by PostgreSQL expression executor - not by PL/pgSQL directly
> - PL/pgSQL cannot process constant by self.

Right, but I wonder if we could pass the entire CASE tree to the
executor, with essentially pointers to the code blocks which will be
executed, and get back a function which we can call over and over that
takes whatever the parameter is and returns the 'right' pointer?

> So any enhancing needs PL/pgSQL redesign and I am not sure, so this
> use case has accurate benefit, because expression bottleneck is only
> one part of partitioning triggers bottleneck. More - if you need
> really fast code, you can use own code in C - and it be 10x times
> faster than any optimized PL/pgSQL code. And using C triggers in
> PostgreSQL is not terrible work.

It's quite a bit of work for people who don't know C or are
(understandably) concerned about writing things which can easily
segfault the entire backend.

> Using plpgsql row triggers for partitioning is not good idea - it is
> just work around from my perspective, and we should to solve source of
> problem - missing native support.

I agree that native partitioning would certainly be nice.  I was really
hoping that was going to happen for 9.3, but it seems unlikely now
(unless I've missed something).

    Thanks,

        Stephen

Вложения

signature.asc

Re: Performance on Bulk Insert to Partitioned Table

От

Stephen Frost

Дата:

28 декабря 2012 г., 17:11:57

* Jeff Janes (jeff.janes@gmail.com) wrote:
> I had thought that too, but the catch is that the target expressions do not
> need to be constants when the function is created.  Indeed, they can even
> be volatile.

Right, any optimization in this regard would only work in certain
instances- eg: when the 'WHEN' components are all constants and the data
type is something we can manage, etc, etc.

    Thanks,

        Stephen

Вложения

signature.asc

Re: Performance on Bulk Insert to Partitioned Table

От

Pavel Stehule

Дата:

28 декабря 2012 г., 17:45:03

2012/12/28 Stephen Frost <sfrost@snowman.net>:
> 2012/12/28 Vitalii Tymchyshyn <tivv00@gmail.com>:
>> Why so? Basic form "case lvalue when rvalue then out ... end" is much like
>> switch.
>
> Sorry, to be honest, I missed that distinction and didn't expect that to
> work as-is, yet apparently it does.  Does it currently perform the same
> as an if/elsif tree or is it implemented to actually use a table lookup?

both IF and CASE has very similar implementation - table lookup is not
used - there are not special path for searching constants

>
> * Pavel Stehule (pavel.stehule@gmail.com) wrote:
>> please, look to plpgsql source code. PL/pgSQL is too simply and has
>> not own arithmetic unit - all is transformed to SELECTs, has not any
>> optimization. But is really short and maintainable.
>
> I was thinking we'd actually do this for all CASE statements, those in
> plpgsql and those in regular SQL, if it's possible to do.  Hopefully
> it'd be possible to do easily in plpgsql once the SQL-level CASE is
> done.
>

I am not sure - SQL case is not heavy specially optimized too :(

I see only one possible way, do almost work when CASE statement is
parsed and bypass executor - this can work, but I afraid so it can
slowdown first start and some use cases where is not too much paths,
because we have to check all paths before executions.

Re: Performance on Bulk Insert to Partitioned Table

От

Jeff Janes

Дата:

28 декабря 2012 г., 19:30:35

On Thursday, December 20, 2012, Scott Marlowe wrote:

3: Someone above mentioned rules being faster than triggers. In my
experience they're WAY slower than triggers but maybe that was just on
the older pg versions (8.3 and lower) we were doing this on. I'd be
interested in seeing some benchmarks if rules have gotten faster or I
was just doing it wrong.

It apparently depends on how you use them.

To load 1e6 rows into the parent, redistributing to 100 partitions (rows evenly distributed over partitions) using RULEs, it took 14.5 seconds using a "insert into foo select * from foo_tmp" (not counting the time it took to prepopulate the foo_tmp via \copy).

This is about 25% faster than the 18.4 seconds it took to load the same data via \copy using a plpgsql trigger which was structured with nested IF ... ELSE...END IF that do a binary search over the partitions.

However if I didn't use \copy or "insert into...select", but rather used a Perl loop invoking normal single-row inserts (but all in a single transaction) with DBD::Pg, then the RULEs took 596 seconds, an astonishing seven times slower than the 83 seconds it took the previously mentioned plpgsql trigger to do the same thing.

This was under 9.1.7.

In 9.2.2, it seems to get 3 times worse yet for RULEs in the insert loop. But that result seems hard to believe, so I am repeating it.

Cheers

Jeff

RES: Performance on Bulk Insert to Partitioned Table

От

"Luciano Ernesto da Silva"

Дата:

28 декабря 2012 г., 21:00:46

UNSUBSCRIBE

De: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] Em nome de Jeff Janes
Enviada em: sexta-feira, 28 de dezembro de 2012 14:31
Para: Scott Marlowe
Cc: Tom Lane; Charles Gomes; Ondrej Ivanič; pgsql-performance@postgresql.org
Assunto: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table

On Thursday, December 20, 2012, Scott Marlowe wrote:

3: Someone above mentioned rules being faster than triggers. In my
experience they're WAY slower than triggers but maybe that was just on
the older pg versions (8.3 and lower) we were doing this on. I'd be
interested in seeing some benchmarks if rules have gotten faster or I
was just doing it wrong.

It apparently depends on how you use them.

This was under 9.1.7.

In 9.2.2, it seems to get 3 times worse yet for RULEs in the insert loop. But that result seems hard to believe, so I am repeating it.

Cheers

Jeff

Re: Performance on Bulk Insert to Partitioned Table

От

Pavel Stehule

Дата:

28 декабря 2012 г., 21:05:46

Hello

2012/12/28 Luciano Ernesto da Silva <luciano@cpd.ufrgs.br>:
> UNSUBSCRIBE
>
>
>
> De: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] Em nome de Jeff Janes
> Enviada em: sexta-feira, 28 de dezembro de 2012 14:31
> Para: Scott Marlowe
> Cc: Tom Lane; Charles Gomes; Ondrej Ivanič; pgsql-performance@postgresql.org
> Assunto: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
>
>
>
>
>
> On Thursday, December 20, 2012, Scott Marlowe wrote:
>
>
> 3: Someone above mentioned rules being faster than triggers. In my
> experience they're WAY slower than triggers but maybe that was just on
> the older pg versions (8.3 and lower) we were doing this on. I'd be
> interested in seeing some benchmarks if rules have gotten faster or I
> was just doing it wrong.
>
>

I am not sure, but I expect so speed or slowness of rules depends
primary on number of partitions. More significantly than triggers.

Regards

Pavel

>
> It apparently depends on how you use them.
>
>
>
> To load 1e6 rows into the parent, redistributing to 100 partitions (rows
> evenly distributed over partitions) using RULEs, it took 14.5 seconds using
> a "insert into foo select * from foo_tmp" (not counting the time it took to
> prepopulate the foo_tmp via \copy).
>
>
>
> This is about 25% faster than the 18.4 seconds it took to load the same data
> via \copy using a plpgsql trigger which was structured with nested IF ...
> ELSE...END IF that do a binary search over the partitions.
>
> However if I didn't use \copy or "insert into...select", but rather used a
> Perl loop invoking normal single-row inserts (but all in a single
> transaction) with DBD::Pg, then the RULEs took 596 seconds, an astonishing
> seven times slower than the 83 seconds it took the previously mentioned
> plpgsql trigger to do the same thing.
>
>
>
> This was under 9.1.7.
>
>
>
> In 9.2.2, it seems to get 3 times worse yet for RULEs in the insert loop.
> But that result seems hard to believe, so I am repeating it.
>
>
>
> Cheers
>
>
>
> Jeff
>
>

Re: Performance on Bulk Insert to Partitioned Table

От

Ali Pouya

Дата:

17 января 2013 г., 17:38:20

2012/12/27 Charles Gomes <charlesrg@outlook.com>

So far that's what I got http://www.widesol.com/~charles/pgsql/partition.c
I had some issues as He uses HeapTuples and on 9.2 I see a Slot.

Hi Charles,
I copied your C code partition.c and am trying to test it.

For compiling you suggest :

...
gcc -I "./" -fpic -c trigger.c
...

Where comes the file trigger.c from ? Is that the one you find in the source directory
./src/backend/commands/ ?

Thanks a lot
Best regards
Ali

Re: Performance on Bulk Insert to Partitioned Table

От

Charles Gomes

Дата:

17 января 2013 г., 18:01:38

________________________________
> Date: Thu, 17 Jan 2013 15:38:14 +0100
> Subject: Re: [PERFORM] Performance on Bulk Insert to Partitioned Table
> From: alipouya2@gmail.com
> To: charlesrg@outlook.com
> CC: pgsql-performance@postgresql.org
>
>
> 2012/12/27 Charles Gomes
> <charlesrg@outlook.com<mailto:charlesrg@outlook.com>>
> So far that's what I got http://www.widesol.com/~charles/pgsql/partition.c
> I had some issues as He uses HeapTuples and on 9.2 I see a Slot.
>
> Hi Charles,
> I copied your C code partition.c and am trying to test it.
>
> For compiling you suggest :
>
> ...
> gcc -I "./" -fpic -c trigger.c
> ...
>
> Where comes the file trigger.c from ? Is that the one you find in the
> source directory
> ./src/backend/commands/    ?
>
> Thanks a lot
> Best regards
> Ali
>
>
>

Ali,
You can save the source as partition.c and use:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fpic -DREFINT_VERBOSE -I. -I.
-I"/usr/pgsql-9.2/include/server/"-D_GNU_SOURCE   -c -o partition.o partition.c 
 
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute-Wformat-security -fno-strict-aliasing -fwrapv -fpic -Wl,--as-needed
-Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags-L/usr/pgsql-9.2/lib -lpgport  -shared -o
/usr/pgsql-9.2/lib/partition.so
 
To Compile you must have postgresql-devel packages.

I've added everything to github:
https://github.com/charlesrg/pgsql_partition/blob/master/partition.c

For more info check
http://www.charlesrg.com/linux/71-postgresql-partitioning-the-database-the-fastest-way

Re: Performance on Bulk Insert to Partitioned Table

От

Ali Pouya

Дата:

17 января 2013 г., 19:03:56

Ali,

You can save the source as partition.c and use:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fpic -DREFINT_VERBOSE -I. -I. -I"/usr/pgsql-9.2/include/server/" -D_GNU_SOURCE -c -o partition.o partition.c

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fpic -Wl,--as-needed -Wl,-rpath,'/usr/local/pgsql/lib',--enable-new-dtags -L/usr/pgsql-9.2/lib -lpgport -shared -o /usr/pgsql-9.2/lib/partition.so

To Compile you must have postgresql-devel packages.

I've added everything to github:
https://github.com/charlesrg/pgsql_partition/blob/master/partition.c

For more info check
http://www.charlesrg.com/linux/71-postgresql-partitioning-the-database-the-fastest-way

Thanks Charles,
Now the compilation is OK.
I'll test and feed back more information if any.
best regards
Ali

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Performance on Bulk Insert to Partitioned Table

Вложения

Вложения

Вложения

Вложения

Вложения