Re: Fix for seg picksplit function
От | Alexander Korotkov |
---|---|
Тема | Re: Fix for seg picksplit function |
Дата | |
Msg-id | AANLkTikbEpKcury9-YeudKBVs71iQWZsv2-GCfAtC7kx@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Fix for seg picksplit function (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Fix for seg picksplit function
(Robert Haas <robertmhaas@gmail.com>)
|
Список | pgsql-hackers |
On Tue, Nov 16, 2010 at 3:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
----
With best regards,
Alexander Korotkov.
But on a broader note, I'm not very certain the sorting algorithm issensible. For example, suppose you have 10 segments that are exactly
'0' and 20 segments that are exactly '1'. Maybe I'm misunderstanding,
but it seems like this will result in a 15/15 split when we almost
certainly want a 10/20 split. I think there will be problems in more
complex cases as well. The documentation says about the less-than and
greater-than operators that "These operators do not make a lot of
sense for any practical purpose but sorting."
In order to illustrate a real problem we should think about gist behavior with great enough amount of data. For example, I tried to extrapolate this case to 100000 of segs where 40% are (0,1) segs and 60% are (1,2) segs. And this case doesn't seem a problem for me.
Here are the details.
test=# insert into seg_test (select case when random()<0.4 then '0..1'::seg else '1..2'::seg end from generate_series(1,100000));
INSERT 0 100000
Time: 7543,941 ms
test=# create index seg_test_idx on seg_test using gist(a);
CREATE INDEX
Time: 17839,450 ms
test=# explain (analyze, buffers) select count(*) from seg_test where a @> '1.5'::seg;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=344.84..344.85 rows=1 width=0) (actual time=186.192..186.193 rows=1 loops=1)
Buffers: shared hit=887
-> Bitmap Heap Scan on seg_test (cost=5.05..344.59 rows=100 width=0) (actual time=16.066..102.586 rows=60206 l
oops=1)
Recheck Cond: (a @> '1.5'::seg)
Buffers: shared hit=887
-> Bitmap Index Scan on seg_test_idx (cost=0.00..5.03 rows=100 width=0) (actual time=15.948..15.948 rows
=60206 loops=1)
Index Cond: (a @> '1.5'::seg)
Buffers: shared hit=396
Total runtime: 186.306 ms
(9 rows)
test=# explain (analyze, buffers) select count(*) from seg_test where a @> '0.5'::seg;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=344.84..344.85 rows=1 width=0) (actual time=144.293..144.295 rows=1 loops=1)
Buffers: shared hit=754
-> Bitmap Heap Scan on seg_test (cost=5.05..344.59 rows=100 width=0) (actual time=28.926..87.625 rows=39794 lo
ops=1)
Recheck Cond: (a @> '0.5'::seg)
Buffers: shared hit=754
-> Bitmap Index Scan on seg_test_idx (cost=0.00..5.03 rows=100 width=0) (actual time=26.969..26.969 rows
=39794 loops=1)
Index Cond: (a @> '0.5'::seg)
Buffers: shared hit=263
Total runtime: 144.374 ms
(9 rows)
test=# select pg_relpages('seg_test_idx');
pg_relpages
-------------
656
(1 row)
Total number of pages in index is 656 and number of pages used in scans is 263+396=659. Only 3 pages overhead.
We can see the distribution of segs in index using gevel.
test=# select a, count(1) from gist_print('seg_test_idx') as t(level int, valid bool, a seg) group by a;
a | count
--------+-------
0 .. 1 | 40054
0 .. 2 | 2
1 .. 2 | 60599
(3 rows)
test=# select level, a, count(1) from gist_print('seg_test_idx') as t(level int, valid bool, a seg) group by level,a order by level, a;
level | a | count
-------+--------+-------
1 | 0 .. 1 | 1
1 | 0 .. 2 | 1
1 | 1 .. 2 | 1
2 | 0 .. 1 | 259
2 | 0 .. 2 | 1
2 | 1 .. 2 | 392
3 | 0 .. 1 | 39794
3 | 1 .. 2 | 60206
(8 rows)
We have only 2 segs '0..2' (one on the first level and one on the second) and all other segs are '0..1' and '1..2'. I think there will be more problems when we'll have many of distinct values where each have count value about half of index page.
With best regards,
Alexander Korotkov.
В списке pgsql-hackers по дате отправления:
Следующее
От: Magnus HaganderДата:
Сообщение: Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running