Обсуждение: [HACKERS] WARM and indirect indexes

Поиск
Список
Период
Сортировка

[HACKERS] WARM and indirect indexes

От
Alvaro Herrera
Дата:
Two options are on the table to attack the problem of updates causing
write amplification: WARM and indirect indexes.  They are completely
different approaches but have overlapping effects on what scenarios are
improved.  Here's a recap of both features, with the intent that we make
a well-considered decision about each.

The main effect of both features is that an updated tuple doesn't
require updating indexes that are on unmodified columns.  Indirect
indexes are a completely new server feature which may enable other
refinements later on; WARM is a targeted optimization on top of the HOT
optimization.  

The big advantage of WARM is that it works automatically, like HOT: the
user doesn't need to do anything different than today to get the
benefit.  With indirect indexes, the user needs to create the index as
indirect explicitely.

There are two big disadvantages to WARM (as to HOT): it cannot be
applied when the heap page is full; and concurrent long-running
transactions can spoil reclaimability of recently dead tuples in heap
pages.  There's a further disadvantage: currently, there can be only one
WARM update in an update chain.  (Pavan believes it is possible to allow
multiple ones.)  All those cases can benefit from indirect indexes.

Another interesting case is a table with a primary key and a JSON
object, on which you have a GIN index (or an int[] array, or text
search).  What happens if you modify the JSON?  With WARM, this is just
a normal index update.  With indirect indexes, you may be able to skip
inserting index entries for all the JSON elements except those which
changed.  (However, this is not implemented yet.)

- When scanning a WARM-updated block starting from an index, you may
need to do more work to walk the update chain until you find the visible
tuple.  Most of the time, HOT/WARM chains are very short thanks to HOT
pruning, so this shouldn't be a problem.

- Indirect indexes require a primary key to be present.  If the PK is
dropped, the IndIx must be dropped too.

- Indirect indexes become larger if the primary key is wide.

- Indirect indexes are not fully implemented yet (need to remove
restriction of PK value being 6 bytes; also wholesale vacuuming, though
there's no universal agreement that this is strictly necessary.)

- An indirect index takes longer to read, since it needs to descend both
the IndIx itself and the primary key index.

-- 
Álvaro Herrera



Re: [HACKERS] WARM and indirect indexes

От
Bruce Momjian
Дата:
On Tue, Jan 10, 2017 at 04:24:42PM -0300, Alvaro Herrera wrote:
> Two options are on the table to attack the problem of updates causing
> write amplification: WARM and indirect indexes.  They are completely
> different approaches but have overlapping effects on what scenarios are
> improved.  Here's a recap of both features, with the intent that we make
> a well-considered decision about each.
> 
> The main effect of both features is that an updated tuple doesn't
> require updating indexes that are on unmodified columns.  Indirect
> indexes are a completely new server feature which may enable other
> refinements later on; WARM is a targeted optimization on top of the HOT
> optimization.  
> 
> The big advantage of WARM is that it works automatically, like HOT: the
> user doesn't need to do anything different than today to get the
> benefit.  With indirect indexes, the user needs to create the index as
> indirect explicitely.

Thank you for the summary.  I think we have to consider two things with
indirect indexes:

1.  What percentage speedup is the _average_ user going to get?  You
have to consider people who will use indirect indexes who get no benefit
or a net slowdown, and users who will get a benefit.

2.  What percentage of users are going to use indirect indexes?

So, for #1 you might have users who are getting +1%, +50%, and -20%, so
maybe +10% average, and for #2 you might have 0.1%.  When you multiply
them out, you get 0.01% average improvement per installation, which is
very small.  Obviously, these are just wild guesses, but this is just to
make a point.  If you assume WARM has been optimized, #1 gets even
lower.

I am not saying we shouldn't do it, but I am afraid that the complexity
in figuring out when to use indirect indexes, combined with the number
of users who will try them, really hurts its inclusion.

FYI, we have a similar issue in adding GUC variables, which I outlined
in this blog post:
http://momjian.us/main/blogs/pgblog/2009.html#January_10_2009

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WARM and indirect indexes

От
Alvaro Herrera
Дата:
Bruce Momjian wrote:

> 1.  What percentage speedup is the _average_ user going to get?  You
> have to consider people who will use indirect indexes who get no benefit
> or a net slowdown, and users who will get a benefit.
> 
> 2.  What percentage of users are going to use indirect indexes?
> 
> So, for #1 you might have users who are getting +1%, +50%, and -20%, so
> maybe +10% average, and for #2 you might have 0.1%.  When you multiply
> them out, you get 0.01% average improvement per installation, which is
> very small.  Obviously, these are just wild guesses, but this is just to
> make a point.

Perhaps not many users will require indirect indexes; but for those that
do, the feature might be invaluable.  We don't do only things that
benefit everybody -- some features are there to satisfy small
populations ("snapshot too old" is a recent example).  We should of
course do, and perhaps even favor doing things that benefit everybody,
but should also do the other things.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WARM and indirect indexes

От
Bruce Momjian
Дата:
On Tue, Jan 10, 2017 at 11:36:24PM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > 1.  What percentage speedup is the _average_ user going to get?  You
> > have to consider people who will use indirect indexes who get no benefit
> > or a net slowdown, and users who will get a benefit.
> > 
> > 2.  What percentage of users are going to use indirect indexes?
> > 
> > So, for #1 you might have users who are getting +1%, +50%, and -20%, so
> > maybe +10% average, and for #2 you might have 0.1%.  When you multiply
> > them out, you get 0.01% average improvement per installation, which is
> > very small.  Obviously, these are just wild guesses, but this is just to
> > make a point.
> 
> Perhaps not many users will require indirect indexes; but for those that
> do, the feature might be invaluable.  We don't do only things that
> benefit everybody -- some features are there to satisfy small
> populations ("snapshot too old" is a recent example).  We should of
> course do, and perhaps even favor doing things that benefit everybody,
> but should also do the other things.

I never said "We should do only things that benefit everybody," so why
are you saying that?  You are arguing against something I didn't say. I
am trying to make a balanced analysis, and you arguing against an
extreme position.

My point is that anything you add must be weighed against the value it
gives to users who use it, and the percentage of users who will use it.
Against that benefit, you have to look at the cost of exposing that API
to users, code complexity, maintenance, etc.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WARM and indirect indexes

От
Alvaro Herrera
Дата:
Bruce Momjian wrote:

> My point is that anything you add must be weighed against the value it
> gives to users who use it, and the percentage of users who will use it.
> Against that benefit, you have to look at the cost of exposing that API
> to users, code complexity, maintenance, etc.

I agree.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WARM and indirect indexes

От
Bruce Momjian
Дата:
On Tue, Jan 10, 2017 at 09:25:05PM -0500, Bruce Momjian wrote:
> Thank you for the summary.  I think we have to consider two things with
> indirect indexes:
> 
> 1.  What percentage speedup is the _average_ user going to get?  You
> have to consider people who will use indirect indexes who get no benefit
> or a net slowdown, and users who will get a benefit.
> 
> 2.  What percentage of users are going to use indirect indexes?
> 
> So, for #1 you might have users who are getting +1%, +50%, and -20%, so
> maybe +10% average, and for #2 you might have 0.1%.  When you multiply
> them out, you get 0.01% average improvement per installation, which is
> very small.  Obviously, these are just wild guesses, but this is just to
> make a point.  If you assume WARM has been optimized, #1 gets even
> lower.

Sorry to have to reply to my own email but I need some of the text
above.

Basically, with WARM, the adoption rate (#2) is 100%.

I am asking what instructions we will give users for #1 to prevent
people from using indirect indexes and getting worse performance.  Are
we going to say, "Use indirect indexes on columns that are updated
frequently?"  Actually, that seems pretty clear and would be easy for
users to follow.

I think the big question is that we will not know the benefits of
indirect indexes over WARM until we have implemented WARM, and if the
benefits of indirect indexes over WARM are small, and considering #2, we
might decide that it isn't worth adding it, for the reasons I already
outlined.

Therefore, I think we need WARM done first, then we can test indirect
indexes to see if they are a sufficient win to add it for the small
percentage of users who will use it.

In general, Postgres doesn't support ever possible performance tuning
option, and I think we are better for that because Postgres is simpler
to use.  Going back to my blog post, if you add a feature, every user
who is considering tuning Postgres has to understand the feature and
decide if they should use it, so even for people who don't user the
feature, there is a cost, however small.

In summary, I love WARM, and might love indirect indexes too, but I need
to feel that indirect indexes are a clear win for the added complexity,
both in our code, and for the user API.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: [HACKERS] WARM and indirect indexes

От
Alvaro Herrera
Дата:
Bruce Momjian wrote:

> Therefore, I think we need WARM done first, then we can test indirect
> indexes to see if they are a sufficient win to add it for the small
> percentage of users who will use it.

Agreed -- that's my plan.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WARM and indirect indexes

От
Bruce Momjian
Дата:
On Wed, Jan 11, 2017 at 12:24:55PM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > Therefore, I think we need WARM done first, then we can test indirect
> > indexes to see if they are a sufficient win to add it for the small
> > percentage of users who will use it.
> 
> Agreed -- that's my plan.

Thanks.  I hate to pour cold water on a feature, honestly, but I don't
want us to over-react to the write amplification problem either.  We
don't want to have X features to improve it when a single feature is
sufficient.  (Amit might be right that the real win for indirect indexes
will be some type of clustered index, where the win might be larger.)

I know Uber dumped us "allegedly" over this issue (among other
complaints), but I am concerned we are overreacting if we change
Postgres too much to address this concern.  Hence, I am arguing we don't
add both features at the same time without evaluating the need for the
second feature after the first feature is done.

Let me give an example of us not over-reacting.  When we implemented HOT
(thanks Pavan), we considered the problem that our default fill factor
for heap is 100%, so there is no room for HOT updates on a full page.
Should we reduce the default fill factor when adding HOT?  We decided
not to, on the hope that the first update to a row on a full page would
put the new row on a page with sufficient free space for future HOT
updates, and that has proven to be the case.

We do document lower full factors for heap to improve HOT updates, but I
think everyone feels that is mostly for good performance after the
initial table load, and that over time the frequently-updated rows will
naturally migrate to pages with sufficient free space.

My point is that we didn't over-react in that case, and the result was
fine --- read-only rows got dense storage, and frequently-updated rows
got sufficient free space for HOT, but we had to push HOT into
production to confirm we were in good shape.  I am thinking we need to
complete WARM to figure out what it doesn't do well in production so we
can fairly evaluate indirect indexes.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +