Обсуждение: query against large table not using sensible index to find very small amount of data

Поиск
Список
Период
Сортировка

query against large table not using sensible index to find very small amount of data

От
"Andrew W. Gibbs"
Дата:
I have a fairly large table (~100M rows), let's call it "events", and
among other things it has a couple of columns on it, columns that
we'll call entity_type_id (an integer) and and published_at (a
timestamp).  It has, among others, indices on (published_at) and
(entity_type_id, published_at).

A very common query against this table is of the form...

SELECT * FROM events WHERE entity_type_id = XXX ORDER BY published_at DESC LIMIT 25;

... to get the most recent 25 events from the table for a given type
of entity, and generally the query planner does the expected thing of
using the two-part index on (entity_type_id, published_at).  Every now
and again, though, I have found the query planner deciding that it
ought use the single column (published_at) index.  This can,
unsurprisingly, result in horrendous performance if events for a given
entity type are rare, as we end up with a very long walk of an index.

I had this happen again yesterday and I noticed something of
particular interest pertaining to the event.  Specifically, the query
was for an entity type that the system had only seen for the first
time one day prior, and furthermore the events table had not been
analyzed by the statistics collector for a couple of weeks.

My intuition is that the query planner, when working with an enormous
table, and furthermore encountering an entity type that the statistics
collector had never previously seen, would assume that the number of
rows in the events table of that entity type would be very small, and
therefore the two-part index on (entity_type_id, published_at) would
be the right choice.  Nonetheless, an EXPLAIN was showing usage of the
(published_at) index, and since there were only ~20 rows in the entire
events table for that entity type the queries were getting the worst
possible execution imaginable, i.e. reading in the whole table to find
the rows that hit, but doing it with the random I/O of an index walk.

As an experiment, I ran a VACUUM ANALYZE on the events table, and then
re-ran the EXPLAIN of the query, and...  Same query plan again...
Maybe for whatever issue I am having the random sampling nature of the
statistics collector made it unhelpful, i.e. in its sampling of the
~100M rows it never hit a single row that had the new entity type
specified?

Other possibly relevant pieces of information...  The entity type
column has a cardinality in the neighborhood of a couple dozen.
Meanwhile, for some of the entity types there is a large and ongoing
number of events, and for other entity types there is a smaller and
more sporadic number of events.  Every now and again a new entity type
shows up.

I can't understand why the query planner would make this choice.
Maybe it has gotten ideas into its head about the distribution of
data?  Or maybe there is a subtle bug that my data set is triggering?
Or maybe I need to turn some knobs on statistics collection?  Or maybe
it's all of these things together?  I worry that even if there is a
knob turning exercise that helps that we're still going to get burned
whenever a new entity type shows up until we re-run ANALYZE, assuming
that I can find a fix that involves tweaking statistics collection.  I
just can't fathom how it would ever be the case that Postgres's choice
of index usage in this case would make sense.  It doesn't even slot
cleanly into the problem space of "why did Postgres do a sequential
scan instead of an index scan?".  If you're doing a query of the
described form and the entity type is specified, wouldn't the two-part
index theoretically _always_ yield better performance than the
one-part index?  Maybe I have a flawed understanding of the cost of
using various indexes?  Maybe there is something analogous between
sequential-versus-index-scan and one-part-versus-two-part-index scan
choices?

FWIW, we're running on 8.4.X and using the out-of-the-box
default_statistics_target setting and haven't dabbled with setting
table level statistics configurations.

Thoughts?  Recommended reading?

  -- AWG


Re: query against large table not using sensible index to find very small amount of data

От
Shaun Thomas
Дата:
> Other possibly relevant pieces of information...  The entity type
> column has a cardinality in the neighborhood of a couple dozen.
> Meanwhile, for some of the entity types there is a large and ongoing
> number of events, and for other entity types there is a smaller and
> more sporadic number of events.  Every now and again a new entity
> type shows up.

With that as the case, I have two questions for you:

1. Why do you have a low cardinality column as the first column in an index?
2. Do you have any queries at all that only use the entity type as the only where clause?

I agree that the planner is probably wrong here, but these choices aren't helping. The low cardinality of the first
columncauses very large buckets that don't limit results very well at all. Combined with the order-by clause, the
plannerreally wants to walk the date index backwards to find results instead. I would do a couple of things. 

First, remove the type/date index. Next, do a count of each type in the table with something like this:

SELECT type_id, count(1)
  FROM my_table
 GROUP BY 2

Any type that is more than 20% of the table will probably never be useful in an index. At this point, you have a
choice.You can create a new index with date and type *in that order* or create a new partial index on date and type
thatalso ignores the top matches. For instance, if you had a type that was 90% of the values, this would be my
suggestion:

CREATE INDEX idx_foo_table_date_event_type_part ON foo_table (event_date, event_type)
 WHERE event_type != 14;

Or whatever. If the IDs are basically evenly distributed, it won't really matter.

In any case, index order matters. The planner wants to restrict data as quickly as possible. If you provide an order
clause,it wants to read the index in that order. Your specified type as the first column disrupts that, so it has to
fetchthe values first, which is usually more expensive. Even if that's wrong in your particular case, planner stats are
notprecise enough to know that. 

Either way, try moving the indexes around. I can't think of many indexes in our database where I have the low
cardinalityvalue as the first column. Databases have an easier time managing many shallow buckets of values, than a few
deepones. 

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd | Suite 400 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: query against large table not using sensible index to find very small amount of data

От
Tom Lane
Дата:
"Andrew W. Gibbs" <awgibbs@awgibbs.com> writes:
> A very common query against this table is of the form...

> SELECT * FROM events WHERE entity_type_id = XXX ORDER BY published_at DESC LIMIT 25;

> ... to get the most recent 25 events from the table for a given type
> of entity, and generally the query planner does the expected thing of
> using the two-part index on (entity_type_id, published_at).  Every now
> and again, though, I have found the query planner deciding that it
> ought use the single column (published_at) index.

What is the estimated rows count according to EXPLAIN when it does that,
versus when it chooses the better plan?

> FWIW, we're running on 8.4.X and using the out-of-the-box
> default_statistics_target setting and haven't dabbled with setting
> table level statistics configurations.

8.4.X is due to reach EOL in July, so you really ought to be thinking
about an upgrade.  It's not clear from the given info whether this issue
is fixable with stats configuration adjustments, is a bug already fixed
in later versions, or neither, but we're unlikely to make any significant
changes in the 8.4 planner code at this point...

            regards, tom lane


Re: query against large table not using sensible index to find very small amount of data

От
Jeff Janes
Дата:
On Tue, Apr 8, 2014 at 6:39 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:

> Other possibly relevant pieces of information...  The entity type
> column has a cardinality in the neighborhood of a couple dozen.
> Meanwhile, for some of the entity types there is a large and ongoing
> number of events, and for other entity types there is a smaller and
> more sporadic number of events.  Every now and again a new entity
> type shows up.

With that as the case, I have two questions for you:

1. Why do you have a low cardinality column as the first column in an index?

Because if he didn't have it, the planner would never be able to use it.  Remember, the problem is when the planner chooses NOT to use that index.

Cheers,

Jeff

Re: query against large table not using sensible index to find very small amount of data

От
"'Andrew W. Gibbs'"
Дата:
Your understanding of the utility of multi-part indices does not jive
with my own.

While I agree that a partial index might be in order here, that ought
just be a performance optimization that lowers the footprint of the
index from an index size and index maintenance standpoint, not
something that governs when the index is used for an item whose entity
type rarely comes up in the table.  If a couple of the entity types
were to constitute 80% of the events, then using a partial index would
reduce the performance strain of maintaining the index by 80%, but
this ought not govern the query planner's behavior when doing queries
on entity types that were not among those.

My general understanding of the utility of multi-part indices is that
they will come into play when some number of the leading columns
appear in the query as fixed values and furthermore if a subsequent
column appears as part of a ranging operation.

I know that a b-tree structure isn't exactly the same as a
binary-tree, but it is roughly equivalent for the purposes of our
conversation...  I believe you can think of multi-part indices as
(roughly) equivalent either to nested binary trees, or as equivalent
to a binary tree whose keys are the concatenation of the various
columns.  In the former case, doing a range scan would be a matter of
hopping through the nested trees until you got to the terminal range
scan operation, and in the latter case doing a range scan would be a
matter of finding the first node in the tree that fell within the
values for your concatenation and then walking through the tree.  Yes,
that's not exactly what happens with a b-tree, but it's pretty
similar, the main differences being performance operations, I believe.

Given that, I don't understand how having a multi-part index with the
column over which I intend to range comes _earlier_ than the column(s)
that I intend to have be fixed would be helpful.  This is especially
true given that the timestamp columns are are the granularity of
_milliseconds_ and my data set sees a constant stream of inputs with
bursts up to ~100 events per second.  I think what you are describing
could only make sense if the date column were at a large granularity,
e.g hours or days.

Or maybe I have missed something...

  -- AWG

On Tue, Apr 08, 2014 at 01:39:41PM +0000, Shaun Thomas wrote:
>
> > Other possibly relevant pieces of information...  The entity type
> > column has a cardinality in the neighborhood of a couple dozen.
> > Meanwhile, for some of the entity types there is a large and ongoing
> > number of events, and for other entity types there is a smaller and
> > more sporadic number of events.  Every now and again a new entity
> > type shows up.
>
> With that as the case, I have two questions for you:
>
> 1. Why do you have a low cardinality column as the first column in an index?
> 2. Do you have any queries at all that only use the entity type as the only where clause?
>
> I agree that the planner is probably wrong here, but these choices aren't helping. The low cardinality of the first
columncauses very large buckets that don't limit results very well at all. Combined with the order-by clause, the
plannerreally wants to walk the date index backwards to find results instead. I would do a couple of things. 
>
> First, remove the type/date index. Next, do a count of each type in the table with something like this:
>
> SELECT type_id, count(1)
>   FROM my_table
>  GROUP BY 2
>
> Any type that is more than 20% of the table will probably never be useful in an index. At this point, you have a
choice.You can create a new index with date and type *in that order* or create a new partial index on date and type
thatalso ignores the top matches. For instance, if you had a type that was 90% of the values, this would be my
suggestion:
>
> CREATE INDEX idx_foo_table_date_event_type_part ON foo_table (event_date, event_type)
>  WHERE event_type != 14;
>
> Or whatever. If the IDs are basically evenly distributed, it won't really matter.
>
> In any case, index order matters. The planner wants to restrict data as quickly as possible. If you provide an order
clause,it wants to read the index in that order. Your specified type as the first column disrupts that, so it has to
fetchthe values first, which is usually more expensive. Even if that's wrong in your particular case, planner stats are
notprecise enough to know that. 
>
> Either way, try moving the indexes around. I can't think of many indexes in our database where I have the low
cardinalityvalue as the first column. Databases have an easier time managing many shallow buckets of values, than a few
deepones. 
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd | Suite 400 | Chicago IL, 60604
> 312-676-8870
> sthomas@optionshouse.com
>
> ______________________________________________
>
> See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


Re: query against large table not using sensible index to find very small amount of data

От
"Andrew W. Gibbs"
Дата:
Tom,

We have continued to explore the issue and one of my teammates,
copied, has made some interesting additional discoveries.

I apparently glossed over a subtle distinction about the query being
issued.  My original reported query structure was of the form...

SELECT * FROM events WHERE entity_type_id = XXX ORDER BY published_at DESC LIMIT 25;

... but in reality it was more like...

SELECT * FROM events WHERE entity_type_id = (SELECT id FROM entity_types WHERE name = ?) ORDER BY published_at DESC
LIMIT25; 

... and these queries in fact yield dramatically different cost
analyses, so much so that if you switch to using the former one by
virtue of doing a query yourself for the entity_type_id instead of
using a subquery, then the system uses the two-part index as hoped.

I suspect that this stems in part from the non-even distribution of
the entity_type_id value in the events table for which there are 20-30
values but two or three of them account for a very large share of the
table (and Postgres only seems to track the fraction taken by each of
the top ten or so values).  For the query that originated my
consternation, I presume the planner said "I don't know which value of
entity_type_id the subquery will yield, so I'll assume an average
density based on everything I've seen in the table" (which really
probably means only the top ten values it has seen), whereas when we
hard-code the entity_type_id by doing the sub-query ourselves
beforehand the query planner says "that value must be either really
rare or non-existent because I haven't even seen it since my last
ANALYZE of the table and this table is huge".

Maybe this is an inherent limitation of the query planner because it
does not want to explore parts of the plan by actually executing
subqueries so that it can make more informed choices about the larger
query?  We restored a back-up of the system onto another machine, ran
the conversion to Postgres 9, cranked up the stats collection
configurations all the way, ran ANALYZE, and still got the same
results, which leads me to believe that there is an issue with the
query planner regarding its ability to do statistical analysis
pertaining to columns in a WHERE clause being specified by a sub-query
(our entity_types table is extremely small, and presumably thus always
in memory, thus a subquery would be insanely cheap, but I appreciate
that we're way down in the weeds of query planning by this point, and
that there may be fundamental problems with issuing actual queries so
as to do exploratory query planning).

We (Scott, really) continued to explore this (using the original
query, not the tweaked one) by doing a mix of alternately dropping
indexes, tuning execution cost configuration parameters, and clearing
the OS cache between queries.  One of the outcomes from this was the
realization that random_page_cost is the dominant factor for the query
plan involving the two-part index, such that when we slash it from the
default 4 to specifying 2 that it slashes the cost almost exactly in
half for using the two-part index and causes it to be used even though
the query planner is over-estimating the prevalence of the column
value due (presumably) to not knowing how the subquery was going to
play out.

This brings me back to my musings about Postgres b-tree index
implementation...  Why should using a two-part index with the WHERE
clause fixing the first column's value yield a query with more random
I/O than walking the single column index and filtering out the
non-matching rows?  Given my understanding of index implementation, it
seems like using the two-part index in even the degenerate case of a
table with only one entity_type_id would yield almost exactly the same
I/O load as using the one-part index, and so a statistical
distribution of the table that was at all better than that degenerate
case would cause selection of the two-part index.  This makes me think
that either this illustrates a second query planner issue or that my
understanding of the implementation of b-tree indexes in Postgres is
flawed.

It seems obvious to me that we need to tweak the cost configuration
parameters in our Postgres installation, at the least lowering
random_page_cost to something more in-line with the hardware we have,
but even that that feels like we would just be skirting issues with
the query planner when either there is a subtle flaw in the planner or
a major flaw in my understanding of b-tree index implementation.

Mind you, I raise these issues as someone who profoundly loves
Postgres, though perhaps is loving it too hard these days.  I would
really like to get a fuller understanding of what is happening here so
as to craft a permanent solution.  I am worried that even if we tweak
one or more of the cost configuration parameters that it might still
be prudent to issue the subquery's look-up prior to the main query and
then embed its results so that the query planner can act with better
knowledge of the specified entity_type_id value's prevalence in the
events table, even though this would feel a little bit like a hack.

Any insights would be greatly appreciate.

  -- AWG

On Tue, Apr 08, 2014 at 09:55:38AM -0400, Tom Lane wrote:
> "Andrew W. Gibbs" <awgibbs@awgibbs.com> writes:
> > A very common query against this table is of the form...
>
> > SELECT * FROM events WHERE entity_type_id = XXX ORDER BY published_at DESC LIMIT 25;
>
> > ... to get the most recent 25 events from the table for a given type
> > of entity, and generally the query planner does the expected thing of
> > using the two-part index on (entity_type_id, published_at).  Every now
> > and again, though, I have found the query planner deciding that it
> > ought use the single column (published_at) index.
>
> What is the estimated rows count according to EXPLAIN when it does that,
> versus when it chooses the better plan?
>
> > FLAW, we're running on 8.4.X and using the out-of-the-box
> > default_statistics_target setting and haven't dabbled with setting
> > table level statistics configurations.
>
> 8.4.X is due to reach SOL in July, so you really ought to be thinking
> about an upgrade.  It's not clear from the given info whether this issue
> is fixable with stats configuration adjustments, is a bug already fixed
> in later versions, or neither, but we're unlikely to make any significant
> changes in the 8.4 planner code at this point...
>
>             regards, tom lane
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance