Re: Guidelines on best indexing strategy for varying searches on 20+ columns

Поиск
Список
Период
Сортировка
От Niels Kristian Schjødt
Тема Re: Guidelines on best indexing strategy for varying searches on 20+ columns
Дата
Msg-id FF969CCD-C1B9-48DE-A6B2-72687B5C4043@autouncle.com
обсуждение исходный текст
Ответ на Re: Guidelines on best indexing strategy for varying searches on 20+ columns  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-performance
Thanks for the answers.

> Den 30/06/2014 kl. 20.04 skrev Jeff Janes <jeff.janes@gmail.com>:
>
> On Wed, Jun 25, 2014 at 1:48 AM, Niels Kristian Schjødt
> <nielskristian@autouncle.com> wrote:
>> Hi,
>> I’m running a search engine for cars. It’s backed by a postgresql 9.3 installation.
>>
>> Now I’m unsure about the best approach/strategy on doing index optimization for the fronted search.
>>
>> The problem:
>>
>> The table containing the cars holds a around 1,5 million rows. People that searches for cars needs different
criteriato search by. Some search by brand/model, some by year, some by mileage, some by price and some by special
equipmentetc. etc. - and often they combine a whole bunch of criteria together. Of cause some, like brand/mode and
price,are used more frequently than others. In total we offer: 9 category criteria like brand/model or body type, plus
5numeric criteria like price or mileage, plus 12 boolean criteria like equipment. Lastly people can order the results
bydifferent columns (year, price, mileage and a score we create about the cars). By default we order by our own
generatedscore. 
>>
>> What I’ve done so far:
>>
>> I have analyzed the usage of the criteria “lightly”, and created a few indexes (10). Among those, are e.g. indexes
onprice, mileage and a combined index on brand/model. Since we are only interested in showing results for cars which is
actuallyfor sale, the indexes are made as partial indexes on a sales state column. 
>
> I'd probably partition the data on whether it is for sale, and then
> search only the for-sale partition.

Hmm okay, I already did all the indexes partial based on the for-sales state. If the queries always queries for-sale,
andall indexes are partial based on those, will it then still help performance / make sense to partition the tables? 
>
>>
>> Questions:
>>
>> 1. How would you go about analyzing and determining what columns should be indexed, and how?
>
> I'd start out with intuition about which columns are likely to be used
> most often, and in a selective way.  And followup by logging slow
> queries so they can be dissected at leisure.
>
>> 2. What is the best strategy when optimizing indexes for searches happening on 20 + columns, where the use and the
combinationsvaries a lot? (To just index everything, to index some of the columns, to do combined indexes, to only do
singlecolumn indexes etc. etc.) 
>
> There is no magic index.  Based on your description, you are going to
> be seq scanning your table a lot.  Focus on making it as small as
> possible, but vertical partitioning it so that the not-for-sale
> entries are hived off to an historical table, and horizontally
> partitioning it so that large columns rarely used in the where clause
> are in a separate table (Ideally you would tell postgresql to
> aggressively toast those columns, but there is no knob with which to
> do that)
>
>
>> 3. I expect that it does not make sense to index all columns?
>
> You mean individually, or jointly?  Either way, probably not.
>
>> 4. I expect it does not make sense to index boolean columns?
>
> In some cases it can, for example if the data distribution is very
> lopsided and the value with the smaller side is frequently specified.
>
>> 5. Is it better to do a combined index on 5 frequently used columns rather than having individual indexes on each of
them?
>
> How often are the columns specified together?  If they are completely
> independent it probably makes little sense to index them together.
>
>> 6. Would it be a goof idea to have all indexes sorted by my default sorting?
>
> You don't get to choose.  An btree index is sorted by the columns
> specified in the index, according to the operators specified (or
> defaulted).  Unless you mean that you want to add the default sort
> column to be the lead column in each index, that actually might make
> sense.
>
>> 7. Do you have so experiences with other approaches that could greatly improve performance (e.g. forcing indexes to
stayin memory etc.)? 
>
> If your queries are as unstructured as you imply, I'd forget about
> indexes for the most part, as you will have a hard time findings ones
> that work.  Concentrate on making seq scans as fast as possible.  If
> most of your queries end in something like "ORDER by price limit 10"
> then concentrate on index scans over price.  You will probably want to
> include heuristics in your UI such that if people configure queries to
> download half your database, you disallow that.  You will probably
> find that 90% of the workload comes from people who are just playing
> around with your website and don't actually intend to do business with
> you.
>
> Cheers,
>
> Jeff


В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Postgres Replaying WAL slowly
Следующее
От: Marc Mamin
Дата:
Сообщение: fragmention issue with ext4: e4defrag?