Обсуждение: Multicolumn index for single-column queries?

Поиск
Список
Период
Сортировка

Multicolumn index for single-column queries?

От
rihad
Дата:
Hi. Say there are 2 indexes:

     "foo_index" btree (foo_id)

     "multi_index" btree (foo_id, approved, expires_at)


foo_id is an integer. Some queries involve all three columns in their WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove foo_index and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to updating them both, but wouldn't searches
on foo_id alone become slower?

Thanks.




Re: Multicolumn index for single-column queries?

От
Gavin Flower
Дата:
On 18/04/2019 18:52, rihad wrote:
> Hi. Say there are 2 indexes:
>
>     "foo_index" btree (foo_id)
>
>     "multi_index" btree (foo_id, approved, expires_at)
>
>
> foo_id is an integer. Some queries involve all three columns in their 
> WHERE clauses, some involve only foo_id.
> Would it be ok from general performance standpoint to remove foo_index 
> and rely only on multi_index? I know that
> PG would have to do less work updating just one index compared to 
> updating them both, but wouldn't searches
> on foo_id alone become slower?
>
> Thanks.
>
>
>
The multi column index will require more RAM to hold it.  So if there is 
memory contention, then there would be an increased risk of swapping, 
leading to slower query times.

I suspect that if there is more than enough RAM, then a multi column 
index will be slightly slower than a single column index. However, the 
difference will probably be lost in the noise -- in other words, the 
various things happening in the background will most likely to have far 
more significant impact on query duration.  IMHO


Cheers,
Gavin





Re: Multicolumn index for single-column queries?

От
Andreas Kretschmer
Дата:

Am 18.04.19 um 08:52 schrieb rihad:
> Hi. Say there are 2 indexes:
>
>     "foo_index" btree (foo_id)
>
>     "multi_index" btree (foo_id, approved, expires_at)
>
>
> foo_id is an integer. Some queries involve all three columns in their 
> WHERE clauses, some involve only foo_id.
> Would it be ok from general performance standpoint to remove foo_index 
> and rely only on multi_index? I know that
> PG would have to do less work updating just one index compared to 
> updating them both, but wouldn't searches
> on foo_id alone become slower? 

it depends .

it depends on the queries you are using, on your workload. a 
multi-column-index will be large than an index over just one column,
therefore you will have more disk-io when you read from such an index.


Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Re: Multicolumn index for single-column queries?

От
Laurenz Albe
Дата:
Andreas Kretschmer wrote:
> Am 18.04.19 um 08:52 schrieb rihad:
> > Hi. Say there are 2 indexes:
> > 
> >     "foo_index" btree (foo_id)
> > 
> >     "multi_index" btree (foo_id, approved, expires_at)
> > 
> > 
> > foo_id is an integer. Some queries involve all three columns in their 
> > WHERE clauses, some involve only foo_id.
> > Would it be ok from general performance standpoint to remove foo_index 
> > and rely only on multi_index? I know that
> > PG would have to do less work updating just one index compared to 
> > updating them both, but wouldn't searches
> > on foo_id alone become slower? 
> 
> it depends .
> 
> it depends on the queries you are using, on your workload. a 
> multi-column-index will be large than an index over just one column,
> therefore you will have more disk-io when you read from such an index.

To be more explicit: if you can live with a slightly less efficient
index scan and want fast data modifications, use only the second index.

If you hardly ever update the table, don't mind the wasted space and
want every bit of query speed (data warehouse), having both indexes
might be better.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Multicolumn index for single-column queries?

От
Ron
Дата:
On 4/18/19 2:14 AM, Andreas Kretschmer wrote:


Am 18.04.19 um 08:52 schrieb rihad:
Hi. Say there are 2 indexes:

    "foo_index" btree (foo_id)

    "multi_index" btree (foo_id, approved, expires_at)


foo_id is an integer. Some queries involve all three columns in their WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove foo_index and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to updating them both, but wouldn't searches
on foo_id alone become slower?

it depends .

it depends on the queries you are using, on your workload. a multi-column-index will be large than an index over just one column,
therefore you will have more disk-io when you read from such an index.

But two indexes are larger than one index, and updating two indexes requires more disk IO than updating one index.

(Prefix compression would obviate the need for this question.  Then your multi-column index would be much smaller.)


--
Angular momentum makes the world go 'round.

Re: Multicolumn index for single-column queries?

От
Gavin Flower
Дата:
On 19/04/2019 01:24, Ron wrote:
> On 4/18/19 2:14 AM, Andreas Kretschmer wrote:
>>
>>
>> Am 18.04.19 um 08:52 schrieb rihad:
>>> Hi. Say there are 2 indexes:
>>>
>>>     "foo_index" btree (foo_id)
>>>
>>>     "multi_index" btree (foo_id, approved, expires_at)
>>>
>>>
>>> foo_id is an integer. Some queries involve all three columns in 
>>> their WHERE clauses, some involve only foo_id.
>>> Would it be ok from general performance standpoint to remove 
>>> foo_index and rely only on multi_index? I know that
>>> PG would have to do less work updating just one index compared to 
>>> updating them both, but wouldn't searches
>>> on foo_id alone become slower? 
>>
>> it depends .
>>
>> it depends on the queries you are using, on your workload. a 
>> multi-column-index will be large than an index over just one column,
>> therefore you will have more disk-io when you read from such an index.
>
> But two indexes are larger than one index, and updating two indexes 
> requires more disk IO than updating one index.

Agreed.

A key question would be: how often is the query run, compared to the 
frequency Insertions, Updates, and Deletions -- wrt the table.

>
> (Prefix compression would obviate the need for this question. Then 
> your multi-column index would be *much* smaller.)

True, but a multi column index will still be bigger than single column 
index.

[...]



Re: Multicolumn index for single-column queries?

От
Harald Fuchs
Дата:
Andreas Kretschmer <andreas@a-kretschmer.de> writes:

> Am 18.04.19 um 08:52 schrieb rihad:
>> Hi. Say there are 2 indexes:
>>
>>     "foo_index" btree (foo_id)
>>
>>     "multi_index" btree (foo_id, approved, expires_at)
>>
>>
>> foo_id is an integer. Some queries involve all three columns in
>> their WHERE clauses, some involve only foo_id.
>> Would it be ok from general performance standpoint to remove
>> foo_index and rely only on multi_index? I know that
>> PG would have to do less work updating just one index compared to
>> updating them both, but wouldn't searches
>> on foo_id alone become slower? 
>
> it depends .
>
> it depends on the queries you are using, on your workload. a
> multi-column-index will be large than an index over just one column,
> therefore you will have more disk-io when you read from such an index.

I think it also depends on the average number of rows having the same foo_id.




Re: Multicolumn index for single-column queries?

От
Ron
Дата:
On 4/18/19 8:45 AM, Gavin Flower wrote:
> On 19/04/2019 01:24, Ron wrote:
>> On 4/18/19 2:14 AM, Andreas Kretschmer wrote:
[snip]
>>
>> (Prefix compression would obviate the need for this question. Then your 
>> multi-column index would be *much* smaller.)
>
> True, but a multi column index will still be bigger than single column index.

TANSTAAFL.


-- 
Angular momentum makes the world go 'round.



Re: Multicolumn index for single-column queries?

От
Gavin Flower
Дата:
On 19/04/2019 02:12, Ron wrote:
> On 4/18/19 8:45 AM, Gavin Flower wrote:
>> On 19/04/2019 01:24, Ron wrote:
>>> On 4/18/19 2:14 AM, Andreas Kretschmer wrote:
> [snip]
>>>
>>> (Prefix compression would obviate the need for this question. Then 
>>> your multi-column index would be *much* smaller.)
>>
>> True, but a multi column index will still be bigger than single 
>> column index.
>
> TANSTAAFL.
>
>
QUOTE: [Oh, 'tanstaafl.' Means ~There ain't no such thing as a free lunch.']
 From The Moon is a Harsh Mistress, by Robert Heinlein. Published 1966
Is where I first came across TANSTAAFL.

However, it appears to have been used at least as early as 1949.

Just adding this, as probably there are many people who don't know the 
acronym.




Re: Multicolumn index for single-column queries?

От
Gavin Flower
Дата:
On 19/04/2019 01:47, Harald Fuchs wrote:
> Andreas Kretschmer <andreas@a-kretschmer.de> writes:
>
>> Am 18.04.19 um 08:52 schrieb rihad:
>>> Hi. Say there are 2 indexes:
>>>
>>>      "foo_index" btree (foo_id)
>>>
>>>      "multi_index" btree (foo_id, approved, expires_at)
>>>
>>>
>>> foo_id is an integer. Some queries involve all three columns in
>>> their WHERE clauses, some involve only foo_id.
>>> Would it be ok from general performance standpoint to remove
>>> foo_index and rely only on multi_index? I know that
>>> PG would have to do less work updating just one index compared to
>>> updating them both, but wouldn't searches
>>> on foo_id alone become slower?
>> it depends .
>>
>> it depends on the queries you are using, on your workload. a
>> multi-column-index will be large than an index over just one column,
>> therefore you will have more disk-io when you read from such an index.
> I think it also depends on the average number of rows having the same foo_id.
>
The number of rows referenced by an index entry for the multi_index will 
always be less than or equal to those for the matching foo_index.

Also there will be fewer index entries per block for the multi_index.  
Which is why the I/O count will be higher; even in the best case, where 
there is an equal row referenced by the index entries.




Re: Multicolumn index for single-column queries?

От
Gavin Flower
Дата:
On 19/04/2019 01:47, Harald Fuchs wrote:
> Andreas Kretschmer<andreas@a-kretschmer.de>  writes:
>
>> Am 18.04.19 um 08:52 schrieb rihad:
>>> Hi. Say there are 2 indexes:
>>>
>>>      "foo_index" btree (foo_id)
>>>
>>>      "multi_index" btree (foo_id, approved, expires_at)
>>>
>>>
>>> foo_id is an integer. Some queries involve all three columns in
>>> their WHERE clauses, some involve only foo_id.
>>> Would it be ok from general performance standpoint to remove
>>> foo_index and rely only on multi_index? I know that
>>> PG would have to do less work updating just one index compared to
>>> updating them both, but wouldn't searches
>>> on foo_id alone become slower?
>> it depends .
>>
>> it depends on the queries you are using, on your workload. a
>> multi-column-index will be large than an index over just one column,
>> therefore you will have more disk-io when you read from such an index.
> I think it also depends on the average number of rows having the same foo_id.
>
The number of rows referenced by an index entry for the multi_index will 
always be less than or equal to those for the matching foo_index.

Also there will be fewer index entries per block for the multi_index, 
which is why the I/O count will be higher even in the best case where 
there is an equal number of rows referenced by each index entry.




Re: Multicolumn index for single-column queries?

От
Gavin Flower
Дата:
On 19/04/2019 14:01, Gavin Flower wrote:
[...]
>
> Also there will be fewer index entries per block for the multi_index, 
> which is why the I/O count will be higher even in the best case where 
> there is an equal number of rows referenced by each index entry.
>
Not sure why my system had this still in my draft folder!


Sorry, for the duplication...