Обсуждение: Forcing index usage

Поиск
Список
Период
Сортировка

Forcing index usage

От
Jonathan Marks
Дата:
Hi folks —

We’ve got several tables that are relatively large (~250-500GB in total size) that use very large (tens to hundreds of
GB)GIN indexes for full-text search. We’ve set the column statistics for our tsvector columns as high as they go
(10,000).We almost always order our search results by a separate date column (which has an index) and we almost always
usea limit. 

Whenever the query planner chooses to use the indexes, queries on these tables are somewhat fast, maxing out at a few
hundredmilliseconds per query (which is terrible, but acceptable to end users). 

When the query planner chooses not to use the indexes, queries can take many tens of seconds if they ever finish at
all.When this happens, the query planner usually chooses to use the date index instead of the GIN index, and that is
almostalways a bad idea. We have sometimes been able to trick it into a better query plan by also adding the tsvector
columnin the ORDER BY clause, but that has bad performance implications if the result set is large. 

Is there a way to tell Postgres “please don’t use index X when queries that could use index Y instead occur?”

Thank you!


Re: Forcing index usage

От
Michael Lewis
Дата:
Is there a way to tell Postgres “please don’t use index X when queries that could use index Y instead occur?”

No. But you could re-write the query to make the date index useless. The simplest way that comes to mind is putting the query that does your full-text search in a CTE (WITH keyword, it is an optimization boundary) and then ordering and applying your limit to the materialized set that comes out of that. eg.

WITH cte_full_text_results AS(
SELECT date, result FROM big_a_table WHERE text = 'whatever'
)
SELECT * FROM cte_full_text_results ORDER BY date DESC limit 10;

Re: Forcing index usage

От
Stephen Frost
Дата:
Greetings,

* Michael Lewis (mlewis@entrata.com) wrote:
> > Is there a way to tell Postgres “please don’t use index X when queries
> > that could use index Y instead occur?”
>
> No. But you could re-write the query to make the date index useless. The
> simplest way that comes to mind is putting the query that does your
> full-text search in a CTE (WITH keyword, it is an optimization boundary)
> and then ordering and applying your limit to the materialized set that
> comes out of that. eg.
>
> WITH cte_full_text_results AS(
> SELECT date, result FROM big_a_table WHERE text = 'whatever'
> )
> SELECT * FROM cte_full_text_results ORDER BY date DESC limit 10;

Note that in v12, you'll need to include the MATERIALIZE keyword,
otherwise we'll in-line the CTE and you might get the plan you don't
like.  That said, it seems a bit unfortunate that there's no clear way
to create an index which specifically answers this query; figuring out a
way to do that could be very beneficial in a number of areas.  The RUM
index type attempts to improve things here, as I understand it.

Thanks!

Stephen

Вложения

Re: Forcing index usage

От
Michael Lewis
Дата:
Thanks for that advance warning since it is a handy option to force the planning barrier in my experience. What's a resource to see other coming changes in v12 especially changes to default behavior like this? Will there be a new cte_collapse_limit setting or similar?

Re: Forcing index usage

От
Stephen Frost
Дата:
Greetings,

* Michael Lewis (mlewis@entrata.com) wrote:
> Thanks for that advance warning since it is a handy option to force the
> planning barrier in my experience. What's a resource to see other coming
> changes in v12 especially changes to default behavior like this? Will there
> be a new cte_collapse_limit setting or similar?

Check the release notes.

Thanks,

Stephen

Вложения

Re: Forcing index usage

От
Bruce Momjian
Дата:
On Wed, Apr  3, 2019 at 03:28:48PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Michael Lewis (mlewis@entrata.com) wrote:
> > Thanks for that advance warning since it is a handy option to force the
> > planning barrier in my experience. What's a resource to see other coming
> > changes in v12 especially changes to default behavior like this? Will there
> > be a new cte_collapse_limit setting or similar?
> 
> Check the release notes.

Yes, once they are written in a few weeks.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Forcing index usage

От
Michael Lewis
Дата:
> * Michael Lewis (mlewis@entrata.com) wrote:
> > Thanks for that advance warning since it is a handy option to force the
> > planning barrier in my experience. What's a resource to see other coming
> > changes in v12 especially changes to default behavior like this? Will there
> > be a new cte_collapse_limit setting or similar?
>
> Check the release notes.

Yes, once they are written in a few weeks.

Thanks for clarifying Bruce. On #postgresql I was directed to the development docs ( https://www.postgresql.org/docs/devel/queries-with.html ) and got the additional information I needed. It is unfortunate there is no MATERIALIZED keyword in v11 to ensure I can preserve existing behavior during upgrade to v12 on queries that seem to benefit greatly from that optimization barrier, but I can appreciate why that decision was made.

Re: Forcing index usage

От
Bruce Momjian
Дата:
On Wed, Apr 17, 2019 at 11:16:28AM -0600, Michael Lewis wrote:
>     > * Michael Lewis (mlewis@entrata.com) wrote:
>     > > Thanks for that advance warning since it is a handy option to force the
>     > > planning barrier in my experience. What's a resource to see other
>     coming
>     > > changes in v12 especially changes to default behavior like this? Will
>     there
>     > > be a new cte_collapse_limit setting or similar?
>     >
>     > Check the release notes.
> 
>     Yes, once they are written in a few weeks.
> 
> 
> Thanks for clarifying Bruce. On #postgresql I was directed to the development
> docs ( https://www.postgresql.org/docs/devel/queries-with.html ) and got the
> additional information I needed. It is unfortunate there is no MATERIALIZED
> keyword in v11 to ensure I can preserve existing behavior during upgrade to v12
> on queries that seem to benefit greatly from that optimization barrier, but I
> can appreciate why that decision was made.

Yes, there was a long discussions about it on IRC on April 3.  There was
the idea that we could backpatch the MATERIALIZED keyword to PG 11 and
earlier and then users could change queries before upgrading to force
the existing behavior in PG 12.  However, that is not without risk, and
Postgres has a tendency to just push such problems to the applications
rather than add complexity to the server:

    https://momjian.us/main/blogs/pgblog/2019.html#February_25_2019

Here is discussion about adding a GUC to set the old behavior, but was
rejected:

    https://www.postgresql.org/message-id/flat/000001d4caed%24d29b9ae0%2477d2d0a0%24%40pcorp.us

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Forcing index usage

От
Jeff Janes
Дата:
On Wed, Apr 3, 2019 at 12:13 PM Jonathan Marks <jonathanaverymarks@gmail.com> wrote:

Is there a way to tell Postgres “please don’t use index X when queries that could use index Y instead occur?”

Late to the party here, but...

Not directly.  I've had luck in changing the procost of functions (or the functions which back the operators) which are frequently executed in the slower plan, but not frequently executed in the faster plan.  For example, walking the time index executing @@ on each row until it finds enough is probably going to involve a lot more @@ than using the full text index and invoking @@ only on the recheck rows.  The cost of @@ (via "ts_match_vq") is probably way too low, especially if it has to fish an argument out of TOAST.

If that doesn't work, you can just change the query to prevent the bad index from being used.

For example, PostgreSQL is not currently smart enough to use an index on "mtime" to support the ordering of a query that is written as:

ORDER BY mtime + interval '0 seconds'

Cheers,

Jeff