Re: Fix for seg picksplit function

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: Fix for seg picksplit function
Дата
Msg-id AANLkTikbEpKcury9-YeudKBVs71iQWZsv2-GCfAtC7kx@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Fix for seg picksplit function  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Fix for seg picksplit function  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Tue, Nov 16, 2010 at 3:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
But on a broader note, I'm not very certain the sorting algorithm is
sensible.  For example, suppose you have 10 segments that are exactly
'0' and 20 segments that are exactly '1'.  Maybe I'm misunderstanding,
but it seems like this will result in a 15/15 split when we almost
certainly want a 10/20 split.  I think there will be problems in more
complex cases as well.  The documentation says about the less-than and
greater-than operators that "These operators do not make a lot of
sense for any practical purpose but sorting."
In order to illustrate a real problem we should think about gist behavior with great enough amount of data. For example, I tried to extrapolate this case to 100000 of segs where 40% are (0,1) segs and 60% are (1,2) segs. And this case doesn't seem a problem for me. 
Here are the details.

test=# insert into seg_test (select case when random()<0.4 then '0..1'::seg else '1..2'::seg end from generate_series(1,100000));
INSERT 0 100000
Time: 7543,941 ms

test=# create index seg_test_idx on seg_test using gist(a);
CREATE INDEX
Time: 17839,450 ms

test=# explain (analyze, buffers) select count(*) from seg_test where a @> '1.5'::seg;
                                                            QUERY PLAN                                             
                
-----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=344.84..344.85 rows=1 width=0) (actual time=186.192..186.193 rows=1 loops=1)
   Buffers: shared hit=887
   ->  Bitmap Heap Scan on seg_test  (cost=5.05..344.59 rows=100 width=0) (actual time=16.066..102.586 rows=60206 l
oops=1)
         Recheck Cond: (a @> '1.5'::seg)
         Buffers: shared hit=887
         ->  Bitmap Index Scan on seg_test_idx  (cost=0.00..5.03 rows=100 width=0) (actual time=15.948..15.948 rows
=60206 loops=1)
               Index Cond: (a @> '1.5'::seg)
               Buffers: shared hit=396
 Total runtime: 186.306 ms
(9 rows)

test=# explain (analyze, buffers) select count(*) from seg_test where a @> '0.5'::seg;
                                                            QUERY PLAN                                             
                
-----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=344.84..344.85 rows=1 width=0) (actual time=144.293..144.295 rows=1 loops=1)
   Buffers: shared hit=754
   ->  Bitmap Heap Scan on seg_test  (cost=5.05..344.59 rows=100 width=0) (actual time=28.926..87.625 rows=39794 lo
ops=1)
         Recheck Cond: (a @> '0.5'::seg)
         Buffers: shared hit=754
         ->  Bitmap Index Scan on seg_test_idx  (cost=0.00..5.03 rows=100 width=0) (actual time=26.969..26.969 rows
=39794 loops=1)
               Index Cond: (a @> '0.5'::seg)
               Buffers: shared hit=263
 Total runtime: 144.374 ms
(9 rows)

test=# select pg_relpages('seg_test_idx');
 pg_relpages 
-------------
         656
(1 row)

Total number of pages in index is 656 and number of pages used in scans is 263+396=659. Only 3 pages overhead.
We can see the distribution of segs in index using gevel.

test=# select a, count(1) from gist_print('seg_test_idx') as t(level int, valid bool, a seg) group by a;
   a    | count 
--------+-------
 0 .. 1 | 40054
 0 .. 2 |     2
 1 .. 2 | 60599
(3 rows)

test=# select level, a, count(1) from gist_print('seg_test_idx') as t(level int, valid bool, a seg) group by level,a order by level, a;
 level |   a    | count 
-------+--------+-------
     1 | 0 .. 1 |     1
     1 | 0 .. 2 |     1
     1 | 1 .. 2 |     1
     2 | 0 .. 1 |   259
     2 | 0 .. 2 |     1
     2 | 1 .. 2 |   392
     3 | 0 .. 1 | 39794
     3 | 1 .. 2 | 60206
(8 rows)

We have only 2 segs '0..2' (one on the first level and one on the second) and all other segs are '0..1' and '1..2'. I think there will be more problems when we'll have many of distinct values where each have count value about half of index page.

----
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dave Page
Дата:
Сообщение: Re: Isn't HANDLE 64 bits on Win64?
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: Re: [BUGS] BUG #5650: Postgres service showing as stopped when in fact it is running