Обсуждение: Why creating GIN table index is so slow than inserting data into empty table with the same index?
Why creating GIN table index is so slow than inserting data into empty table with the same index?
От
Sergey Burladyan
Дата:
example:
select version();
version
--------------------------------------------------------------------------------------------
PostgreSQL 8.3.6 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Debian 4.3.3-3) 4.3.3
show maintenance_work_mem ;
maintenance_work_mem
----------------------
128MB
create table a (i1 int, i2 int, i3 int, i4 int, i5 int, i6 int);
insert into a select n, n, n, n, n, n from generate_series(1, 100000) as n;
INSERT 0 100000
Время: 570,110 мс
create index arr_gin on a using gin ( (array[i1, i2, i3, i4, i5, i6]) );
CREATE INDEX
Время: 203068,314 мс
truncate a;
drop index arr_gin ;
create index arr_gin on a using gin ( (array[i1, i2, i3, i4, i5, i6]) );
CREATE INDEX
Время: 3,246 мс
insert into a select n, n, n, n, n, n from generate_series(1, 100000) as n;
INSERT 0 100000
Время: 2405,481 мс
select pg_size_pretty(pg_total_relation_size('a')) as total,
pg_size_pretty(pg_relation_size('a')) as table;
total | table
---------+---------
9792 kB | 5096 kB
203068.314 ms VS 2405.481 ms, is this behaviour normal ?
Thanks !
--
Sergey Burladyan
Sergey Burladyan <eshkinkot@gmail.com> writes:
> show maintenance_work_mem ;
> maintenance_work_mem
> ----------------------
> 128MB
> create table a (i1 int, i2 int, i3 int, i4 int, i5 int, i6 int);
> insert into a select n, n, n, n, n, n from generate_series(1, 100000) as n;
> create index arr_gin on a using gin ( (array[i1, i2, i3, i4, i5, i6]) );
[ takes forever ]
Seems the reason this is so awful is that the incoming data is exactly
presorted, meaning that the tree structure that ginInsertEntry() is
trying to build degenerates to a linear list (ie, every new element
becomes the right child of the prior one). So the processing becomes
O(N^2) up till you reach maintenance_work_mem and flush the tree. With
a large setting for maintenance_work_mem it gets spectacularly slow.
I think a reasonable solution for this might be to keep an eye on
maxdepth and force a flush if that gets too large (more than a few
hundred, perhaps?). Something like this:
/* If we've maxed out our available memory, dump everything to the index */
+ /* Also dump if the tree seems to be getting too unbalanced */
- if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L)
+ if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L ||
+ buildstate->accum.maxdepth > DEPTH_LIMIT)
{
The new fast-insert code likely will need a similar defense.
Comments?
regards, tom lane
Re: Why creating GIN table index is so slow than inserting data into empty table with the same index?
От
Heikki Linnakangas
Дата:
Tom Lane wrote: > Sergey Burladyan <eshkinkot@gmail.com> writes: >> show maintenance_work_mem ; >> maintenance_work_mem >> ---------------------- >> 128MB > >> create table a (i1 int, i2 int, i3 int, i4 int, i5 int, i6 int); >> insert into a select n, n, n, n, n, n from generate_series(1, 100000) as n; >> create index arr_gin on a using gin ( (array[i1, i2, i3, i4, i5, i6]) ); > > [ takes forever ] > > Seems the reason this is so awful is that the incoming data is exactly > presorted, meaning that the tree structure that ginInsertEntry() is > trying to build degenerates to a linear list (ie, every new element > becomes the right child of the prior one). So the processing becomes > O(N^2) up till you reach maintenance_work_mem and flush the tree. With > a large setting for maintenance_work_mem it gets spectacularly slow. Yes, this is probably the same issue I bumped into a while ago: http://archives.postgresql.org/message-id/49350A13.3020105@enterprisedb.com > I think a reasonable solution for this might be to keep an eye on > maxdepth and force a flush if that gets too large (more than a few > hundred, perhaps?). Something like this: > > /* If we've maxed out our available memory, dump everything to the index */ > + /* Also dump if the tree seems to be getting too unbalanced */ > - if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L) > + if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L || > + buildstate->accum.maxdepth > DEPTH_LIMIT) > { > > The new fast-insert code likely will need a similar defense. I fooled around with a balanced tree, which solved the problem but unfortunately made the unsorted case slower. Limiting the depth like that should avoid that so it's worth trying. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> I think a reasonable solution for this might be to keep an eye on
>> maxdepth and force a flush if that gets too large (more than a few
>> hundred, perhaps?). Something like this:
> I fooled around with a balanced tree, which solved the problem but
> unfortunately made the unsorted case slower.
Yeah, rebalancing the search tree would fix that, but every balanced
tree algorithm I know about is complicated, slow, and needs extra
memory. It's really unclear that it'd be worth the trouble for a
transient data structure like this one.
regards, tom lane