Re: Guidelines on best indexing strategy for varying searches on 20+ columns

Поиск

Список

Период

Сортировка

От	Jeff Janes
Тема	Re: Guidelines on best indexing strategy for varying searches on 20+ columns
Дата	30 июня 2014 г. 18:04:42
Msg-id	CAMkU=1wT-HEzLR=G9ENaBabg17qCwKMQZA=21_+ZsDRdXGva0w@mail.gmail.com обсуждение исходный текст
Ответ на	Guidelines on best indexing strategy for varying searches on 20+ columns (Niels Kristian Schjødt <nielskristian@autouncle.com>)
Ответы	Re: Guidelines on best indexing strategy for varying searches on 20+ columns
Список	pgsql-performance

Дерево обсуждения

On Wed, Jun 25, 2014 at 1:48 AM, Niels Kristian Schjødt
<nielskristian@autouncle.com> wrote:
> Hi,
> I’m running a search engine for cars. It’s backed by a postgresql 9.3 installation.
>
> Now I’m unsure about the best approach/strategy on doing index optimization for the fronted search.
>
> The problem:
>
> The table containing the cars holds a around 1,5 million rows. People that searches for cars needs different criteria
tosearch by. Some search by brand/model, some by year, some by mileage, some by price and some by special equipment
etc.etc. - and often they combine a whole bunch of criteria together. Of cause some, like brand/mode and price, are
usedmore frequently than others. In total we offer: 9 category criteria like brand/model or body type, plus 5 numeric
criterialike price or mileage, plus 12 boolean criteria like equipment. Lastly people can order the results by
differentcolumns (year, price, mileage and a score we create about the cars). By default we order by our own generated
score.
>
> What I’ve done so far:
>
> I have analyzed the usage of the criteria “lightly”, and created a few indexes (10). Among those, are e.g. indexes on
price,mileage and a combined index on brand/model. Since we are only interested in showing results for cars which is
actuallyfor sale, the indexes are made as partial indexes on a sales state column. 

I'd probably partition the data on whether it is for sale, and then
search only the for-sale partition.

>
> Questions:
>
> 1. How would you go about analyzing and determining what columns should be indexed, and how?

I'd start out with intuition about which columns are likely to be used
most often, and in a selective way.  And followup by logging slow
queries so they can be dissected at leisure.

> 2. What is the best strategy when optimizing indexes for searches happening on 20 + columns, where the use and the
combinationsvaries a lot? (To just index everything, to index some of the columns, to do combined indexes, to only do
singlecolumn indexes etc. etc.) 

There is no magic index.  Based on your description, you are going to
be seq scanning your table a lot.  Focus on making it as small as
possible, but vertical partitioning it so that the not-for-sale
entries are hived off to an historical table, and horizontally
partitioning it so that large columns rarely used in the where clause
are in a separate table (Ideally you would tell postgresql to
aggressively toast those columns, but there is no knob with which to
do that)

> 3. I expect that it does not make sense to index all columns?

You mean individually, or jointly?  Either way, probably not.

> 4. I expect it does not make sense to index boolean columns?

In some cases it can, for example if the data distribution is very
lopsided and the value with the smaller side is frequently specified.

> 5. Is it better to do a combined index on 5 frequently used columns rather than having individual indexes on each of
them?

How often are the columns specified together?  If they are completely
independent it probably makes little sense to index them together.

> 6. Would it be a goof idea to have all indexes sorted by my default sorting?

You don't get to choose.  An btree index is sorted by the columns
specified in the index, according to the operators specified (or
defaulted).  Unless you mean that you want to add the default sort
column to be the lead column in each index, that actually might make
sense.

> 7. Do you have so experiences with other approaches that could greatly improve performance (e.g. forcing indexes to
stayin memory etc.)? 

If your queries are as unstructured as you imply, I'd forget about
indexes for the most part, as you will have a hard time findings ones
that work.  Concentrate on making seq scans as fast as possible.  If
most of your queries end in something like "ORDER by price limit 10"
then concentrate on index scans over price.  You will probably want to
include heuristics in your UI such that if people configure queries to
download half your database, you disallow that.  You will probably
find that 90% of the workload comes from people who are just playing
around with your website and don't actually intend to do business with
you.

Cheers,

Jeff

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Guidelines on best indexing strategy for varying searches on 20+ columns