Обсуждение: Performance die when COPYing to table with bigint PK

Поиск
Список
Период
Сортировка

Performance die when COPYing to table with bigint PK

От
Robert Ayrapetyan
Дата:
Hello.

I've found strange behavior of my pg installation (tested both 8.4 and
9.0 - they behave same) on FreeBSD platform.
In short - when some table have PK on bigint field - COPY to that
table from file becomes slower and slower as table grows. When table
reaches ~5GB - COPY of 100k records may take up to 20 mins. I've
experimented with all params in configs, moved indexes to separate hdd
etc - nothing made any improvement. However, once I'm dropping 64 bit
PK - COPY of 100k records passes in seconds. Interesting thing - same
table has other indexes, including composite ones, but none of them
include bigint fields, that's why I reached decision that bug
connected with indexes on bigint fields only.

In terms of IO picture is following: after copy started gstat shows
100% load on index partition (as I mentioned above - I've tried
separate hdd to keep index tablespace), large queue (over 2k
elements), and constant slow write on speed of ~2MB\s. Hdd becomes
completely unresponsive, even ls on empty folder hangs for minute or
so.

To avoid thoughts like "your hdd is slow, you haven't tuned
postgresql.conf etc" - all slowness dissapears with drop of bigint PK,
same time other indexes on same table remain alive. And yes - I've
tried drop PK \ recreate PK, vacuum full analyze and all other things
- nothing helped, only drop helps.

Is this known and expected behavior?

--
Ayrapetyan Robert,
Comodo Anti-Malware Data Processing Analysis and Management System (CAMDPAMS)
http://repo-qa.camdpams.odessa.office.comodo.net/mediawiki/index.php

Re: Performance die when COPYing to table with bigint PK

От
Simon Riggs
Дата:
On Sun, Jul 31, 2011 at 2:51 PM, Robert Ayrapetyan
<robert.ayrapetyan@comodo.com> wrote:

> I've found strange behavior of my pg installation (tested both 8.4 and
> 9.0 - they behave same) on FreeBSD platform.
> In short - when some table have PK on bigint field - COPY to that
> table from file becomes slower and slower as table grows. When table
> reaches ~5GB - COPY of 100k records may take up to 20 mins. I've
> experimented with all params in configs, moved indexes to separate hdd
> etc - nothing made any improvement. However, once I'm dropping 64 bit
> PK - COPY of 100k records passes in seconds. Interesting thing - same
> table has other indexes, including composite ones, but none of them
> include bigint fields, that's why I reached decision that bug
> connected with indexes on bigint fields only.
>
> In terms of IO picture is following: after copy started gstat shows
> 100% load on index partition (as I mentioned above - I've tried
> separate hdd to keep index tablespace), large queue (over 2k
> elements), and constant slow write on speed of ~2MB\s. Hdd becomes
> completely unresponsive, even ls on empty folder hangs for minute or
> so.
>
> To avoid thoughts like "your hdd is slow, you haven't tuned
> postgresql.conf etc" - all slowness dissapears with drop of bigint PK,
> same time other indexes on same table remain alive. And yes - I've
> tried drop PK \ recreate PK, vacuum full analyze and all other things
> - nothing helped, only drop helps.
>
> Is this known and expected behavior?

This is a duplicate post with one on BUGS, being discussed there.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Performance die when COPYing to table with bigint PK

От
Vitalii Tymchyshyn
Дата:
31.07.11 16:51, Robert Ayrapetyan написав(ла):
> Hello.
>
> I've found strange behavior of my pg installation (tested both 8.4 and
> 9.0 - they behave same) on FreeBSD platform.
> In short - when some table have PK on bigint field - COPY to that
> table from file becomes slower and slower as table grows. When table
> reaches ~5GB - COPY of 100k records may take up to 20 mins. I've
> experimented with all params in configs, moved indexes to separate hdd
> etc - nothing made any improvement. However, once I'm dropping 64 bit
> PK - COPY of 100k records passes in seconds. Interesting thing - same
> table has other indexes, including composite ones, but none of them
> include bigint fields, that's why I reached decision that bug
> connected with indexes on bigint fields only.
I did see this behavior, but as for me it occurs for UNIQUE indexes only
(including PK), not dependent on field type.
You can check this by dropping PK and creating it as a regular
non-unique index.

Best regards, Vitalii Tymchyshyn

Re: Performance die when COPYing to table with bigint PK

От
Vitalii Tymchyshyn
Дата:
02.08.11 11:26, Robert Ayrapetyan написав(ла):
> Seems this assumption is not right. Just created simple index on
> bigint column - situation with huge performance
> degradation repeated. Dropping this index solved COPY issues on the fly.
> So I'm still convinced - this bug relates to FreeBSD 64-bit + UFS +
> bigint column index
> (some of these may be superfluous, but I have no resources to check on
> different platforms with different filesystems).
Interrrresting. We also have FreeBSDx64 on UFS and are using bigint
(bigserial) keys. It seems I will need to perform more tests here
because I do see similar problems. I for sure can do a copy of data with
int4 keys and test the performance.
BTW: The thing we are going to try on next upgrade is to change UFS
block size from 16K to 8K. What problem I saw is that with default
setting, UFS needs to read additional 8K when postgresql writes it's
page (and for index random writes can be vital). Unfortunately, such a
changes requires partition reformat and I can't afford it for now.

Best regards, Vitalii Tymchyshyn

Re: Performance die when COPYing to table with bigint PK

От
Robert Ayrapetyan
Дата:
Quite possible.
But anyway - I don't think performance degradation must be so huge in
case of using UNIQUE indexes.

On Mon, Aug 1, 2011 at 12:06 PM, Vitalii Tymchyshyn <tivv00@gmail.com> wrote:
> 31.07.11 16:51, Robert Ayrapetyan написав(ла):
>>
>> Hello.
>>
>> I've found strange behavior of my pg installation (tested both 8.4 and
>> 9.0 - they behave same) on FreeBSD platform.
>> In short - when some table have PK on bigint field - COPY to that
>> table from file becomes slower and slower as table grows. When table
>> reaches ~5GB - COPY of 100k records may take up to 20 mins. I've
>> experimented with all params in configs, moved indexes to separate hdd
>> etc - nothing made any improvement. However, once I'm dropping 64 bit
>> PK - COPY of 100k records passes in seconds. Interesting thing - same
>> table has other indexes, including composite ones, but none of them
>> include bigint fields, that's why I reached decision that bug
>> connected with indexes on bigint fields only.
>
> I did see this behavior, but as for me it occurs for UNIQUE indexes only
> (including PK), not dependent on field type.
> You can check this by dropping PK and creating it as a regular non-unique
> index.
>
> Best regards, Vitalii Tymchyshyn
>



--
Ayrapetyan Robert,
Comodo Anti-Malware Data Processing Analysis and Management System (CAMDPAMS)
http://repo-qa.camdpams.odessa.office.comodo.net/mediawiki/index.php

Re: Performance die when COPYing to table with bigint PK

От
Robert Ayrapetyan
Дата:
Seems this assumption is not right. Just created simple index on
bigint column - situation with huge performance
degradation repeated. Dropping this index solved COPY issues on the fly.
So I'm still convinced - this bug relates to FreeBSD 64-bit + UFS +
bigint column index
(some of these may be superfluous, but I have no resources to check on
different platforms with different filesystems).

On Mon, Aug 1, 2011 at 12:15 PM, Robert Ayrapetyan
<robert.ayrapetyan@comodo.com> wrote:
> Quite possible.
> But anyway - I don't think performance degradation must be so huge in
> case of using UNIQUE indexes.
>
> On Mon, Aug 1, 2011 at 12:06 PM, Vitalii Tymchyshyn <tivv00@gmail.com> wrote:
>> 31.07.11 16:51, Robert Ayrapetyan написав(ла):
>>>
>>> Hello.
>>>
>>> I've found strange behavior of my pg installation (tested both 8.4 and
>>> 9.0 - they behave same) on FreeBSD platform.
>>> In short - when some table have PK on bigint field - COPY to that
>>> table from file becomes slower and slower as table grows. When table
>>> reaches ~5GB - COPY of 100k records may take up to 20 mins. I've
>>> experimented with all params in configs, moved indexes to separate hdd
>>> etc - nothing made any improvement. However, once I'm dropping 64 bit
>>> PK - COPY of 100k records passes in seconds. Interesting thing - same
>>> table has other indexes, including composite ones, but none of them
>>> include bigint fields, that's why I reached decision that bug
>>> connected with indexes on bigint fields only.
>>
>> I did see this behavior, but as for me it occurs for UNIQUE indexes only
>> (including PK), not dependent on field type.
>> You can check this by dropping PK and creating it as a regular non-unique
>> index.
>>
>> Best regards, Vitalii Tymchyshyn
>>
>
>
>
> --
> Ayrapetyan Robert,
> Comodo Anti-Malware Data Processing Analysis and Management System (CAMDPAMS)
> http://repo-qa.camdpams.odessa.office.comodo.net/mediawiki/index.php
>



--
Ayrapetyan Robert,
Comodo Anti-Malware Data Processing Analysis and Management System (CAMDPAMS)
http://repo-qa.camdpams.odessa.office.comodo.net/mediawiki/index.php

Re: Performance die when COPYing to table with bigint PK

От
"Kevin Grittner"
Дата:
Robert Ayrapetyan <robert.ayrapetyan@comodo.com> wrote:

> So I'm still convinced - this bug relates to FreeBSD 64-bit + UFS
> + bigint column index
> (some of these may be superfluous, but I have no resources to
> check on different platforms with different filesystems).

Linux 64 bit XFS bigint column index only shows a slightly longer
run time for bigint versus int here.  What timings do you get for
the insert statements if you run the following in your environment?

create table bi (big bigint not null, medium int not null);
insert into bi with x(n) as (select generate_series(1, 1000000)
select n + 5000000000, n from x;
\timing on
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 1000000)) select n + 5000000000, n from x;
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 1000000)) select n + 5000000000, n from x;
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 1000000)) select n + 5000000000, n from x;
create unique index bi_medium on bi (medium);
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 1000000)) select n + 5000000000, n from x;
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 1000000)) select n + 5000000000, n from x;
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 1000000)) select n + 5000000000, n from x;
drop index bi_medium;
create unique index bi_big on bi (big);
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 1000000)) select n + 5000000000, n from x;
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 1000000)) select n + 5000000000, n from x;
truncate table bi; insert into bi with x(n) as (select
generate_series(1, 1000000)) select n + 5000000000, n from x;
\timing off
drop table bi;

Here's what I get:

Time: 1629.141 ms
Time: 1638.060 ms
Time: 1711.833 ms

Time: 4151.953 ms
Time: 4602.679 ms
Time: 5107.259 ms

Time: 4654.060 ms
Time: 5158.157 ms
Time: 5101.110 ms

-Kevin

Re: Performance die when COPYing to table with bigint PK

От
"Kevin Grittner"
Дата:
Robert Ayrapetyan <robert.ayrapetyan@comodo.com> wrote:
> Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

>> What timings do you get for the insert statements if you run the
>> following in your environment?

>> Here's what I get:
>>
>> Time: 1629.141 ms
>> Time: 1638.060 ms
>> Time: 1711.833 ms
>>
>> Time: 4151.953 ms
>> Time: 4602.679 ms
>> Time: 5107.259 ms
>>
>> Time: 4654.060 ms
>> Time: 5158.157 ms
>> Time: 5101.110 ms

> Timings for your test:

> [no index]
> Time: 2789.607 ms
> Time: 2959.679 ms
> Time: 3651.206 ms

> [int index]
> Time: 5671.883 ms
> Time: 5668.894 ms
> Time: 6530.010 ms

> [bigint index]
> Time: 8093.276 ms
> Time: 5988.694 ms
> Time: 5702.236 ms

> [regarding tests which do show the problem]
> tried same with 2 columns (bigint and int) - it didn't produced
> such effect probably because data volume has critical effect.

Based on what you're showing, this is almost certainly just a matter
of pushing your volume of active data above the threshold of what
your cache holds, forcing it to do disk access rather than RAM
access for a significant portion of the reads.

-Kevin

Re: Performance die when COPYing to table with bigint PK

От
Vitalii Tymchyshyn
Дата:
04.08.11 18:59, Kevin Grittner написав(ла):
> Robert Ayrapetyan<robert.ayrapetyan@comodo.com>  wrote:
>> Kevin Grittner<Kevin.Grittner@wicourts.gov>  wrote:
>
>> [regarding tests which do show the problem]
>> tried same with 2 columns (bigint and int) - it didn't produced
>> such effect probably because data volume has critical effect.
>
> Based on what you're showing, this is almost certainly just a matter
> of pushing your volume of active data above the threshold of what
> your cache holds, forcing it to do disk access rather than RAM
> access for a significant portion of the reads.
>
> -Kevin
Yep. Seems so. Plus famous "you'd better insert data, then create indexes".
On my database it takes twice the time for int8 then for int4 to insert
data.
Also it takes ~twice a time (2 hours) to add 200K of rows to 200M of
rows than to make an index over 200M of rows (1 hour).

Best regards, Vitalii Tymchyshyn

Re: Performance die when COPYing to table with bigint PK

От
"Kevin Grittner"
Дата:
Robert Ayrapetyan <robert.ayrapetyan@comodo.com> wrote:

> If you look at the rest of my mail - you would notice 50 times
> difference in performance.
> What you would say?

That accessing a page from RAM is more than 50 times as fast as a
random access of that page from disk.

-Kevin

Re: Performance die when COPYing to table with bigint PK

От
Robert Ayrapetyan
Дата:
Hi.

Timings for your test:

foo=# create table bi (big bigint not null, medium int not null);
CREATE TABLE
foo=# insert into bi with x(n) as (select generate_series(1, 1000000))
foo-# select n + 5000000000, n from x;
INSERT 0 1000000
foo=# \timing on
Timing is on.
foo=# truncate table bi; insert into bi with x(n) as (select
TRUNCATE TABLE
Time: 211.205 ms
foo(# generate_series(1, 1000000)) select n + 5000000000, n from x;
INSERT 0 1000000
Time: 2789.607 ms
foo=# truncate table bi; insert into bi with x(n) as (select
TRUNCATE TABLE
Time: 206.712 ms
foo(# generate_series(1, 1000000)) select n + 5000000000, n from x;
INSERT 0 1000000
Time: 2959.679 ms
foo=# truncate table bi; insert into bi with x(n) as (select
TRUNCATE TABLE
Time: 594.584 ms
foo(# generate_series(1, 1000000)) select n + 5000000000, n from x;
INSERT 0 1000000
Time: 3651.206 ms
foo=# create unique index bi_medium on bi (medium);
CREATE INDEX
Time: 781.407 ms
foo=# truncate table bi; insert into bi with x(n) as (select
TRUNCATE TABLE
Time: 42.177 ms
foo(# generate_series(1, 1000000)) select n + 5000000000, n from x;
INSERT 0 1000000
Time: 5671.883 ms
foo=# truncate table bi; insert into bi with x(n) as (select
TRUNCATE TABLE
Time: 139.418 ms
foo(# generate_series(1, 1000000)) select n + 5000000000, n from x;
INSERT 0 1000000
Time: 5668.894 ms
foo=# truncate table bi; insert into bi with x(n) as (select
TRUNCATE TABLE
Time: 204.479 ms
foo(# generate_series(1, 1000000)) select n + 5000000000, n from x;
INSERT 0 1000000
Time: 6530.010 ms
foo=# drop index bi_medium;
DROP INDEX
Time: 212.038 ms
foo=# create unique index bi_big on bi (big);
CREATE INDEX
Time: 650.492 ms
foo=# truncate table bi; insert into bi with x(n) as (select
TRUNCATE TABLE
Time: 39.818 ms
foo(# generate_series(1, 1000000)) select n + 5000000000, n from x;
INSERT 0 1000000
Time: 8093.276 ms
foo=# truncate table bi; insert into bi with x(n) as (select
TRUNCATE TABLE
Time: 282.165 ms
foo(# generate_series(1, 1000000)) select n + 5000000000, n from x;
INSERT 0 1000000
Time: 5988.694 ms
foo=# truncate table bi; insert into bi with x(n) as (select
TRUNCATE TABLE
Time: 245.859 ms
foo(# generate_series(1, 1000000)) select n + 5000000000, n from x;
INSERT 0 1000000
Time: 5702.236 ms
foo=# \timing off
Timing is off.


Now please perform mine:

CREATE TABLESPACE tblsp_ix LOCATION '/foo';
CREATE SCHEMA test;
CREATE TABLE test.t
(
    id_big bigint, --PRIMARY KEY USING INDEX TABLESPACE tblsp_ix,
    ts timestamp NOT NULL,
    ip inet,
    id_medium integer NOT NULL,
    id_small smallint NOT NULL,
    id_smalll smallint NOT NULL
);
CREATE INDEX ix_t ON test.t
    USING btree (ts, ip, id_medium, id_small) TABLESPACE tblsp_ix;

gen_data.csh
-------------cut here-----------------------------------------------------------
#!/bin/tcsh
set f = $1
set lines_cnt = $2
rm ${f}
set id_big = -2147483648
set time_t = 1000000000
set ts = `date -r ${time_t}`
set ip = "127.0.0.1"
set id_medium = -2147483648
set id_small = 0
set echo_style = both
while ( $lines_cnt > 0 )
    echo "${id_big}\t${ts}\t${ip}\t${id_medium}\t${id_small}\t${id_small}"
>> ${f}
    @ id_big = ${id_big} + 1
    @ time_t = ${time_t} + 1
    @ id_medium = ${id_medium} + 1
    @ lines_cnt = ${lines_cnt} - 1
end
exit 0
-------------cut here-----------------------------------------------------------

time ./gen_data.csh app.data 100000
9.564u 2.487s 0:12.05 99.9%     420+1113k 0+51io 0pf+0w

copy_data.csh
-------------cut here-----------------------------------------------------------
#!/bin/tcsh
set f = $1
set cnt = $2
while ( $cnt > 0 )
    time psql -d foo -c "COPY test.t(id_big, ts, ip, id_medium,
id_small, id_smalll) from '$f'"
    @ cnt = ${cnt} - 1
end
exit 0
-------------cut here-----------------------------------------------------------

time copy_data.csh /aaa/app.data 100
...
0.000u 0.027s 0:01.55 1.2%      474+1254k 0+0io 0pf+0w
COPY 100000
...
(~1-3 sec for every of 100 iterations with 3-4 spikes to 5 secs max)

CREATE INDEX ix_t_big ON test.t USING btree (id_big) TABLESPACE tblsp_ix;

time copy_data.csh /aaa/app.data 100
(the show begins from iteration # ~20):
COPY 100000
0.000u 0.005s 0:20.70 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.000u 0.005s 0:06.50 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.000u 0.037s 0:03.44 0.8%      704+514k 0+0io 0pf+0w
COPY 100000
0.007u 0.029s 0:04.55 0.4%      808+1746k 0+0io 0pf+0w
COPY 100000
0.005u 0.000s 0:03.60 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.005u 0.000s 0:02.55 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.000u 0.037s 0:03.03 0.9%      469+197k 0+0io 0pf+0w
COPY 100000
0.000u 0.036s 0:03.85 0.7%      526+1393k 0+0io 0pf+0w
COPY 100000
0.005u 0.000s 0:06.66 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.000u 0.037s 0:02.73 1.0%      526+1393k 0+0io 0pf+0w
COPY 100000
0.000u 0.005s 0:11.85 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.000u 0.022s 0:02.56 0.7%      492+1238k 0+0io 0pf+0w
COPY 100000
0.007u 0.022s 0:02.46 0.8%      650+1328k 0+0io 0pf+0w
COPY 100000
0.006u 0.031s 0:04.71 0.6%      692+525k 0+0io 0pf+0w
COPY 100000
0.000u 0.039s 0:29.10 0.1%      526+1393k 0+0io 0pf+0w
COPY 100000
0.000u 0.036s 0:36.29 0.0%      538+1164k 0+0io 0pf+0w
COPY 100000
0.000u 0.037s 0:43.77 0.0%      526+1393k 0+0io 0pf+0w
COPY 100000
0.000u 0.036s 1:01.94 0.0%      538+1164k 0+0io 0pf+0w
COPY 100000
0.007u 0.029s 0:13.99 0.1%      808+2074k 0+0io 0pf+0w
COPY 100000
0.003u 0.005s 0:46.02 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.000u 0.031s 0:45.58 0.0%      316+836k 0+0io 0pf+0w
COPY 100000
0.000u 0.038s 1:00.39 0.0%      526+1393k 0+0io 0pf+0w
COPY 100000
0.000u 0.036s 0:24.38 0.1%      538+1164k 0+0io 0pf+0w
COPY 100000
0.000u 0.037s 0:41.32 0.0%      538+1382k 0+0io 0pf+0w
COPY 100000
0.000u 0.005s 0:46.13 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.005u 0.000s 0:43.15 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.000u 0.005s 0:45.59 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.000u 0.005s 1:54.92 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.000u 0.037s 2:22.47 0.0%      538+1382k 0+0io 0pf+0w
COPY 100000
0.000u 0.005s 1:40.65 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.006u 0.020s 1:43.52 0.0%      650+1328k 0+0io 0pf+0w
COPY 100000
0.000u 0.036s 1:43.33 0.0%      538+1164k 0+0io 0pf+0w
COPY 100000
0.000u 0.036s 1:47.00 0.0%      526+1393k 0+0io 0pf+0w
COPY 100000
0.000u 0.036s 2:18.94 0.0%      538+1164k 0+0io 0pf+0w

from that moment all iterations went for more then 1 min and I interrupted test.

DROP INDEX test.ix_t_big;

time copy_data.csh /aaa/app.data 100
COPY 100000
0.000u 0.005s 0:02.42 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.007u 0.029s 0:01.88 1.0%      808+2074k 0+0io 0pf+0w
COPY 100000
0.000u 0.005s 0:01.83 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.000u 0.005s 0:01.75 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.000u 0.005s 0:01.82 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.000u 0.037s 0:01.81 1.6%      526+1393k 0+0io 0pf+0w
COPY 100000
0.000u 0.036s 0:01.84 1.6%      538+1164k 0+0io 0pf+0w
COPY 100000
0.000u 0.036s 0:01.86 1.6%      421+1114k 0+0io 0pf+0w
COPY 100000
0.000u 0.036s 0:01.77 1.6%      538+1164k 0+0io 0pf+0w
...
Everything returned back to good perfomance state.

With number of rows > 50 mln all numbers in test with index on bigint column
are multiplied on 20, while without index even on 200 mln rows speed
remains constant (1-2 sec per 100k rows file).

P.S. tried same with 2 columns (bigint and int) - it didn't produced such effect
probably because data volume has critical effect.


On Tue, Aug 2, 2011 at 8:41 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Ayrapetyan <robert.ayrapetyan@comodo.com> wrote:
>
>> So I'm still convinced - this bug relates to FreeBSD 64-bit + UFS
>> + bigint column index
>> (some of these may be superfluous, but I have no resources to
>> check on different platforms with different filesystems).
>
> Linux 64 bit XFS bigint column index only shows a slightly longer
> run time for bigint versus int here.  What timings do you get for
> the insert statements if you run the following in your environment?
>
> create table bi (big bigint not null, medium int not null);
> insert into bi with x(n) as (select generate_series(1, 1000000)
> select n + 5000000000, n from x;
> \timing on
> truncate table bi; insert into bi with x(n) as (select
> generate_series(1, 1000000)) select n + 5000000000, n from x;
> truncate table bi; insert into bi with x(n) as (select
> generate_series(1, 1000000)) select n + 5000000000, n from x;
> truncate table bi; insert into bi with x(n) as (select
> generate_series(1, 1000000)) select n + 5000000000, n from x;
> create unique index bi_medium on bi (medium);
> truncate table bi; insert into bi with x(n) as (select
> generate_series(1, 1000000)) select n + 5000000000, n from x;
> truncate table bi; insert into bi with x(n) as (select
> generate_series(1, 1000000)) select n + 5000000000, n from x;
> truncate table bi; insert into bi with x(n) as (select
> generate_series(1, 1000000)) select n + 5000000000, n from x;
> drop index bi_medium;
> create unique index bi_big on bi (big);
> truncate table bi; insert into bi with x(n) as (select
> generate_series(1, 1000000)) select n + 5000000000, n from x;
> truncate table bi; insert into bi with x(n) as (select
> generate_series(1, 1000000)) select n + 5000000000, n from x;
> truncate table bi; insert into bi with x(n) as (select
> generate_series(1, 1000000)) select n + 5000000000, n from x;
> \timing off
> drop table bi;
>
> Here's what I get:
>
> Time: 1629.141 ms
> Time: 1638.060 ms
> Time: 1711.833 ms
>
> Time: 4151.953 ms
> Time: 4602.679 ms
> Time: 5107.259 ms
>
> Time: 4654.060 ms
> Time: 5158.157 ms
> Time: 5101.110 ms
>
> -Kevin
>



--
Ayrapetyan Robert,
Comodo Anti-Malware Data Processing Analysis and Management System (CAMDPAMS)
http://repo-qa.camdpams.odessa.office.comodo.net/mediawiki/index.php

Re: Performance die when COPYing to table with bigint PK

От
Robert Ayrapetyan
Дата:
If you look at the rest of my mail - you would notice 50 times
difference in performance.
What you would say?

On Thu, Aug 4, 2011 at 7:11 PM, Vitalii Tymchyshyn <tivv00@gmail.com> wrote:
> 04.08.11 18:59, Kevin Grittner написав(ла):
>>
>> Robert Ayrapetyan<robert.ayrapetyan@comodo.com>  wrote:
>>>
>>> Kevin Grittner<Kevin.Grittner@wicourts.gov>  wrote:
>>
>>> [regarding tests which do show the problem]
>>> tried same with 2 columns (bigint and int) - it didn't produced
>>> such effect probably because data volume has critical effect.
>>
>> Based on what you're showing, this is almost certainly just a matter
>> of pushing your volume of active data above the threshold of what
>> your cache holds, forcing it to do disk access rather than RAM
>> access for a significant portion of the reads.
>>
>> -Kevin
>
> Yep. Seems so. Plus famous "you'd better insert data, then create indexes".
> On my database it takes twice the time for int8 then for int4 to insert
> data.
> Also it takes ~twice a time (2 hours) to add 200K of rows to 200M of rows
> than to make an index over 200M of rows (1 hour).
>
> Best regards, Vitalii Tymchyshyn
>



--
Ayrapetyan Robert,
Comodo Anti-Malware Data Processing Analysis and Management System (CAMDPAMS)
http://repo-qa.camdpams.odessa.office.comodo.net/mediawiki/index.php

Re: Performance die when COPYing to table with bigint PK

От
Robert Ayrapetyan
Дата:
All you are saying disproves following:

in experiment I replaces bigint index:

CREATE INDEX ix_t_big ON test.t USING btree (id_big) TABLESPACE tblsp_ix;

with 4 (!) other indexes:

CREATE INDEX ix_t2 ON test.t USING btree (ip) TABLESPACE tblsp_ix;
CREATE INDEX ix_t3 ON test.t USING btree (id_small) TABLESPACE tblsp_ix;
CREATE INDEX ix_t4 ON test.t USING btree (id_smalll) TABLESPACE tblsp_ix;
CREATE INDEX ix_t5 ON test.t USING btree (ts) TABLESPACE tblsp_ix;

which are definitely larger then one bigint index.

0.000u 0.005s 0:13.23 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.000u 0.035s 0:05.08 0.5%      421+1114k 0+0io 0pf+0w
COPY 100000
0.000u 0.036s 0:19.28 0.1%      526+1393k 0+0io 0pf+0w
COPY 100000
0.000u 0.005s 0:05.56 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.006u 0.012s 0:05.57 0.1%      984+1820k 0+0io 0pf+0w
COPY 100000
0.007u 0.029s 0:05.20 0.3%      808+1746k 0+0io 0pf+0w
COPY 100000
0.005u 0.000s 0:05.35 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.000u 0.011s 0:05.92 0.1%      316+836k 0+0io 0pf+0w
COPY 100000
0.000u 0.005s 0:12.08 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.000u 0.029s 0:05.46 0.3%      808+2074k 0+0io 0pf+0w
COPY 100000
0.002u 0.002s 0:05.35 0.0%      0+0k 0+0io 0pf+0w
COPY 100000
0.000u 0.005s 0:06.52 0.0%      0+0k 0+0io 0pf+0w

Insertions became slower 4-5 times, which is ok.

Nothing is closer to even half of minute, while one bigint index constantly
gives more then minute and even 2 for 100k records.




On Thu, Aug 4, 2011 at 8:22 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Ayrapetyan <robert.ayrapetyan@comodo.com> wrote:
>
>> If you look at the rest of my mail - you would notice 50 times
>> difference in performance.
>> What you would say?
>
> That accessing a page from RAM is more than 50 times as fast as a
> random access of that page from disk.
>
> -Kevin
>



--
Ayrapetyan Robert,
Comodo Anti-Malware Data Processing Analysis and Management System (CAMDPAMS)
http://repo-qa.camdpams.odessa.office.comodo.net/mediawiki/index.php

Re: Performance die when COPYing to table with bigint PK

От
Віталій Тимчишин
Дата:

In my tests it greatly depends on if index writes are random or sequential. My test time goes down from few hours to seconds if I add to the end of index.
As for me, best comparision would be to make two equal int4 columns with same data as in int8, two indexes, then perform the test. My bet it will be slower than int8.

Четвер, 4 серпня 2011 р. користувач Robert Ayrapetyan <robert.ayrapetyan@comodo.com> написав:
> All you are saying disproves following:
>
> in experiment I replaces bigint index:
>
> CREATE INDEX ix_t_big ON test.t USING btree (id_big) TABLESPACE tblsp_ix;
>
> with 4 (!) other indexes:
>
>>> If you look at the rest of my mail - you would notice 50 times
>>> difference in performance.
>>> What you would say?
>>
>> That accessing a page from RAM is more than 50 times as fast as a
>> random access of that page from disk.
>>
>> -Kevin
>>
>
>
>
> --
> Ayrapetyan Robert,
> Comodo Anti-Malware Data Processing Analysis and Management System (CAMDPAMS)
> http://repo-qa.camdpams.odessa.office.comodo.net/mediawiki/index.php
>

--
Best regards,
 Vitalii Tymchyshyn

Re: Performance die when COPYing to table with bigint PK

От
Vitalii Tymchyshyn
Дата:
05.08.11 11:44, Robert Ayrapetyan написав(ла):
> Yes, you are right. Performance become even more awful.
> Can some techniques from pg_bulkload be implemented in postgres core?
> Current performance is not suitable for any enterprise-wide production system.
BTW: I was thinking this morning about indexes.
How about next feature:
Implement new index type, that will have two "zones" - old & new. New
zone is of fixed configurable size, say 100 pages (800 K).
Any search goes into both zones. So, as soon as index is larger then
800K, the search must be done twice.
As soon as new zone hit's it's size limit, part (may be only one?) of
it's pages are merged with old zone. The merge is "rolling" - if last
merge've stopped at "X" entry, next merge will start at entry right after X.

As for me, this should greatly resolve large index insert problem:
1) Insert into new zone must be quick because it's small and hot in cache.
2) During merge writes will be grouped because items with near keys (for
B-tree) or hashes (for hash index) will go to small subset of "old" zone
pages. In future, merge can be also done by autovacuum in background.
Yes, we get dual index search, but new zone will be hot, so this won't
make it twice as costly.

Best regards, Vitalii Tymchyshyn


Re: Performance die when COPYing to table with bigint PK

От
Robert Ayrapetyan
Дата:
Yes, you are right. Performance become even more awful.
Can some techniques from pg_bulkload be implemented in postgres core?
Current performance is not suitable for any enterprise-wide production system.

2011/8/5 Віталій Тимчишин <tivv00@gmail.com>:
>
> In my tests it greatly depends on if index writes are random or sequential.
> My test time goes down from few hours to seconds if I add to the end of
> index.
> As for me, best comparision would be to make two equal int4 columns with
> same data as in int8, two indexes, then perform the test. My bet it will be
> slower than int8.
>
> Четвер, 4 серпня 2011 р. користувач Robert Ayrapetyan
> <robert.ayrapetyan@comodo.com> написав:
>> All you are saying disproves following:
>>
>> in experiment I replaces bigint index:
>>
>> CREATE INDEX ix_t_big ON test.t USING btree (id_big) TABLESPACE tblsp_ix;
>>
>> with 4 (!) other indexes:
>>
>>>> If you look at the rest of my mail - you would notice 50 times
>>>> difference in performance.
>>>> What you would say?
>>>
>>> That accessing a page from RAM is more than 50 times as fast as a
>>> random access of that page from disk.
>>>
>>> -Kevin
>>>
>>
>>
>>
>> --
>> Ayrapetyan Robert,
>> Comodo Anti-Malware Data Processing Analysis and Management System
>> (CAMDPAMS)
>> http://repo-qa.camdpams.odessa.office.comodo.net/mediawiki/index.php
>>
>
> --
> Best regards,
>  Vitalii Tymchyshyn
>



--
Ayrapetyan Robert,
Comodo Anti-Malware Data Processing Analysis and Management System (CAMDPAMS)
http://repo-qa.camdpams.odessa.office.comodo.net/mediawiki/index.php