Обсуждение: TODO-Item: B-tree fillfactor control

Поиск
Список
Период
Сортировка

TODO-Item: B-tree fillfactor control

От
ITAGAKI Takahiro
Дата:
Hi Hackers,

I'm trying the following TODO item:
  [Indexes]
    - Add fillfactor to control reserved free space during index creation

I have already made an patch and it seemed to work well.

----
$ ./pgbench -i -s 10
# select relpages from pg_class where relname = 'accounts_pkey';
relpages | 2745    ( default PCTFree is 10% )

# set btree_leaf_free_percent = 0;
# reindex index accounts_pkey;
# select relpages from pg_class where relname = 'accounts_pkey';
relpages | 2475    ( <- about 2745 * 0.9 = 2470.5 )

# set btree_leaf_free_percent = 30;
# reindex index accounts_pkey;
# select relpages from pg_class where relname = 'accounts_pkey';
relpages | 3537    ( <- about 2745 * 0.9 / 0.7 = 3529.3 )
----

And now, I need advice on some issues.

- Is it appropriate to use GUC variables to control fillfactors?
    Is it better to extend CREATE INDEX / REINDEX grammar?
- Should indexes remember their fillfactors when they are created?
    The last fillfactors will be used on next reindex.
- Is fillfactor useful for hash and gist indexes?
    I think hash does not need it, but gist might need it.

Look forward to your comments.
Thanks,

---
ITAGAKI Takahiro
NTT Cyber Space Laboratories

Вложения

Re: TODO-Item: B-tree fillfactor control

От
Bruce Momjian
Дата:
ITAGAKI Takahiro wrote:
> Hi Hackers,
> 
> I'm trying the following TODO item:
>   [Indexes]
>     - Add fillfactor to control reserved free space during index creation 
> 
> I have already made an patch and it seemed to work well.

Great.

> And now, I need advice on some issues.
> 
> - Is it appropriate to use GUC variables to control fillfactors?
>     Is it better to extend CREATE INDEX / REINDEX grammar?

I think it has to be part of CREATE INDEX and ALTER INDEX.

Is there a use for separate node and leaf settings?

This patch needs documentation, and if we have separate items, we are
going to have to explain when to use node or leaf.

> - Should indexes remember their fillfactors when they are created?
>     The last fillfactors will be used on next reindex.

They should remember, for sure, and REINDEX should use it.  It think
this is similar to the ALTER TABLE ALTER [ COLUMN ] ... SET STATISTICS 
functionality.  It will need to be dumped as well by pg_dump.  If you
need help with any of this, let me know.

> - Is fillfactor useful for hash and gist indexes?
>     I think hash does not need it, but gist might need it.

Not sure.  We don't know what type of index a GIST will be so we have no
way of knowing.  I am thinking we can implement just btree now and the
GIST folks can add it later if they want.  My guess is that each GIST is
going to behave differently for different fill-factors, so if allow it
to be set for GIST, GIST developers can pull the value if they want.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: TODO-Item: B-tree fillfactor control

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> - Should indexes remember their fillfactors when they are created?
>> The last fillfactors will be used on next reindex.

> They should remember, for sure, and REINDEX should use it.  It think
> this is similar to the ALTER TABLE ALTER [ COLUMN ] ... SET STATISTICS 
> functionality.  It will need to be dumped as well by pg_dump.

If you want it to be dumped by pg_dump (which is debatable IMHO) then
it MUST NOT be a syntax extension, it has to be driven by a GUC
variable, else we have compatibility problems with the dumps.  We just
went through this with WITH/WITHOUT OIDS.
        regards, tom lane


Re: TODO-Item: B-tree fillfactor control

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> - Should indexes remember their fillfactors when they are created?
> >> The last fillfactors will be used on next reindex.
> 
> > They should remember, for sure, and REINDEX should use it.  It think
> > this is similar to the ALTER TABLE ALTER [ COLUMN ] ... SET STATISTICS 
> > functionality.  It will need to be dumped as well by pg_dump.
> 
> If you want it to be dumped by pg_dump (which is debatable IMHO) then
> it MUST NOT be a syntax extension, it has to be driven by a GUC
> variable, else we have compatibility problems with the dumps.  We just
> went through this with WITH/WITHOUT OIDS.

OK, so we are going to issue a GUC to set the fill factor in pg_dump,
but still have a fillfactor syntax for use by users?  That is how we do
WITH/WITHOUT OIDS.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: TODO-Item: B-tree fillfactor control

От
Christopher Kings-Lynne
Дата:
> If you want it to be dumped by pg_dump (which is debatable IMHO) then
> it MUST NOT be a syntax extension, it has to be driven by a GUC
> variable, else we have compatibility problems with the dumps.  We just
> went through this with WITH/WITHOUT OIDS.

Compatibility problems?  CREATE INDEX isn't an SQL standard command is it?

Chris



Re: TODO-Item: B-tree fillfactor control

От
Tom Lane
Дата:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> Compatibility problems?  CREATE INDEX isn't an SQL standard command is it?

No, but it'll cause unnecessary cross-version compatibility issues for
us.
        regards, tom lane


Re: TODO-Item: B-tree fillfactor control

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> > Compatibility problems?  CREATE INDEX isn't an SQL standard command is it?
> 
> No, but it'll cause unnecessary cross-version compatibility issues for
> us.

It is true it isn't SQL standard, but I think our CREATE INDEX syntax
matches many vendor's syntax in most cases.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: TODO-Item: B-tree fillfactor control

От
ITAGAKI Takahiro
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> wrote:

> > - Is fillfactor useful for hash and gist indexes?
> >     I think hash does not need it, but gist might need it.
> 
> Not sure.  We don't know what type of index a GIST will be so we have no
> way of knowing.  I am thinking we can implement just btree now and the
> GIST folks can add it later if they want.  My guess is that each GIST is
> going to behave differently for different fill-factors, so if allow it
> to be set for GIST, GIST developers can pull the value if they want.

My understanding about hash was wrong. It uses fill factor of 75%, which is
hard-coded. On the other hand, GIST has no ability to control fill factor
currently. I'm trying to add fill factors to hash and gist, so I'll ask
index developers to review a patch in the future.


> > - Is it appropriate to use GUC variables to control fillfactors?
> >     Is it better to extend CREATE INDEX / REINDEX grammar?
> 
> I think it has to be part of CREATE INDEX and ALTER INDEX.

SQL standard has no regulation for indexes, so I refered to other databases. - Oracle and DB2 : CREATE INDEX index ON
table(...) PCTFREE 30; - MS SQL Server  : CREATE INDEX index ON table (...) WITH FILLFACTOR = 70;
 

PCTFREE seems to be common, so I'll extend DDL to use PCTFREE syntax.
The following two syntaxes will be able to be used.  1. SET btree_free_percent = 30;    CREATE INDEX index ON table
(...);   SET btree_free_percent = 10; -- revert 2. CREATE INDEX index ON table (...) PCTFREE 30;
 

1 would be useful for a compatibe pg_dump format, per suggestion from Tom.


> Is there a use for separate node and leaf settings?

We should use different settings for leaf and node, but it may confuse users.
So I'll simplify the setting as follows:       node_free_percent = Min(30%, 3 * leaf_free_percent)
When leaf_free_percent is 10%, node_free_percent is 30%. They are the same
values of the current implementation.

---
ITAGAKI Takahiro
NTT Cyber Space Laboratories




Re: TODO-Item: B-tree fillfactor control

От
Bruce Momjian
Дата:
ITAGAKI Takahiro wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> wrote:
> 
> > > - Is fillfactor useful for hash and gist indexes?
> > >     I think hash does not need it, but gist might need it.
> > 
> > Not sure.  We don't know what type of index a GIST will be so we have no
> > way of knowing.  I am thinking we can implement just btree now and the
> > GIST folks can add it later if they want.  My guess is that each GIST is
> > going to behave differently for different fill-factors, so if allow it
> > to be set for GIST, GIST developers can pull the value if they want.
> 
> My understanding about hash was wrong. It uses fill factor of 75%, which is
> hard-coded. On the other hand, GIST has no ability to control fill factor
> currently. I'm trying to add fill factors to hash and gist, so I'll ask
> index developers to review a patch in the future.

OK.

> > > - Is it appropriate to use GUC variables to control fillfactors?
> > >     Is it better to extend CREATE INDEX / REINDEX grammar?
> > 
> > I think it has to be part of CREATE INDEX and ALTER INDEX.
> 
> SQL standard has no regulation for indexes, so I refered to other databases.
>   - Oracle and DB2 : CREATE INDEX index ON table (...) PCTFREE 30;
>   - MS SQL Server  : CREATE INDEX index ON table (...) WITH FILLFACTOR = 70;
> 
> PCTFREE seems to be common, so I'll extend DDL to use PCTFREE syntax.
> The following two syntaxes will be able to be used. 
>   1. SET btree_free_percent = 30;
>      CREATE INDEX index ON table (...);
>      SET btree_free_percent = 10; -- revert
>   2. CREATE INDEX index ON table (...) PCTFREE 30;
> 
> 1 would be useful for a compatibe pg_dump format, per suggestion from Tom.

I personally like FILLFACTOR, but I understand the desire to match
Oracle.  PCTFREE seems too abreviated for me, but it would match the GUC
better, so maybe it is the best.

> > Is there a use for separate node and leaf settings?
> 
> We should use different settings for leaf and node, but it may confuse users.
> So I'll simplify the setting as follows:
>         node_free_percent = Min(30%, 3 * leaf_free_percent)
> When leaf_free_percent is 10%, node_free_percent is 30%. They are the same
> values of the current implementation.

Yes, I think that is ideal.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073