Обсуждение: Multicolumn index for single-column queries?
Hi. Say there are 2 indexes: "foo_index" btree (foo_id) "multi_index" btree (foo_id, approved, expires_at) foo_id is an integer. Some queries involve all three columns in their WHERE clauses, some involve only foo_id. Would it be ok from general performance standpoint to remove foo_index and rely only on multi_index? I know that PG would have to do less work updating just one index compared to updating them both, but wouldn't searches on foo_id alone become slower? Thanks.
On 18/04/2019 18:52, rihad wrote: > Hi. Say there are 2 indexes: > > "foo_index" btree (foo_id) > > "multi_index" btree (foo_id, approved, expires_at) > > > foo_id is an integer. Some queries involve all three columns in their > WHERE clauses, some involve only foo_id. > Would it be ok from general performance standpoint to remove foo_index > and rely only on multi_index? I know that > PG would have to do less work updating just one index compared to > updating them both, but wouldn't searches > on foo_id alone become slower? > > Thanks. > > > The multi column index will require more RAM to hold it. So if there is memory contention, then there would be an increased risk of swapping, leading to slower query times. I suspect that if there is more than enough RAM, then a multi column index will be slightly slower than a single column index. However, the difference will probably be lost in the noise -- in other words, the various things happening in the background will most likely to have far more significant impact on query duration. IMHO Cheers, Gavin
Am 18.04.19 um 08:52 schrieb rihad: > Hi. Say there are 2 indexes: > > "foo_index" btree (foo_id) > > "multi_index" btree (foo_id, approved, expires_at) > > > foo_id is an integer. Some queries involve all three columns in their > WHERE clauses, some involve only foo_id. > Would it be ok from general performance standpoint to remove foo_index > and rely only on multi_index? I know that > PG would have to do less work updating just one index compared to > updating them both, but wouldn't searches > on foo_id alone become slower? it depends . it depends on the queries you are using, on your workload. a multi-column-index will be large than an index over just one column, therefore you will have more disk-io when you read from such an index. Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com
Andreas Kretschmer wrote: > Am 18.04.19 um 08:52 schrieb rihad: > > Hi. Say there are 2 indexes: > > > > "foo_index" btree (foo_id) > > > > "multi_index" btree (foo_id, approved, expires_at) > > > > > > foo_id is an integer. Some queries involve all three columns in their > > WHERE clauses, some involve only foo_id. > > Would it be ok from general performance standpoint to remove foo_index > > and rely only on multi_index? I know that > > PG would have to do less work updating just one index compared to > > updating them both, but wouldn't searches > > on foo_id alone become slower? > > it depends . > > it depends on the queries you are using, on your workload. a > multi-column-index will be large than an index over just one column, > therefore you will have more disk-io when you read from such an index. To be more explicit: if you can live with a slightly less efficient index scan and want fast data modifications, use only the second index. If you hardly ever update the table, don't mind the wasted space and want every bit of query speed (data warehouse), having both indexes might be better. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On 4/18/19 2:14 AM, Andreas Kretschmer wrote:
But two indexes are larger than one index, and updating two indexes requires more disk IO than updating one index.
(Prefix compression would obviate the need for this question. Then your multi-column index would be much smaller.)
Am 18.04.19 um 08:52 schrieb rihad:Hi. Say there are 2 indexes:
"foo_index" btree (foo_id)
"multi_index" btree (foo_id, approved, expires_at)
foo_id is an integer. Some queries involve all three columns in their WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove foo_index and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to updating them both, but wouldn't searches
on foo_id alone become slower?
it depends .
it depends on the queries you are using, on your workload. a multi-column-index will be large than an index over just one column,
therefore you will have more disk-io when you read from such an index.
But two indexes are larger than one index, and updating two indexes requires more disk IO than updating one index.
(Prefix compression would obviate the need for this question. Then your multi-column index would be much smaller.)
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
On 19/04/2019 01:24, Ron wrote: > On 4/18/19 2:14 AM, Andreas Kretschmer wrote: >> >> >> Am 18.04.19 um 08:52 schrieb rihad: >>> Hi. Say there are 2 indexes: >>> >>> "foo_index" btree (foo_id) >>> >>> "multi_index" btree (foo_id, approved, expires_at) >>> >>> >>> foo_id is an integer. Some queries involve all three columns in >>> their WHERE clauses, some involve only foo_id. >>> Would it be ok from general performance standpoint to remove >>> foo_index and rely only on multi_index? I know that >>> PG would have to do less work updating just one index compared to >>> updating them both, but wouldn't searches >>> on foo_id alone become slower? >> >> it depends . >> >> it depends on the queries you are using, on your workload. a >> multi-column-index will be large than an index over just one column, >> therefore you will have more disk-io when you read from such an index. > > But two indexes are larger than one index, and updating two indexes > requires more disk IO than updating one index. Agreed. A key question would be: how often is the query run, compared to the frequency Insertions, Updates, and Deletions -- wrt the table. > > (Prefix compression would obviate the need for this question. Then > your multi-column index would be *much* smaller.) True, but a multi column index will still be bigger than single column index. [...]
Andreas Kretschmer <andreas@a-kretschmer.de> writes: > Am 18.04.19 um 08:52 schrieb rihad: >> Hi. Say there are 2 indexes: >> >> "foo_index" btree (foo_id) >> >> "multi_index" btree (foo_id, approved, expires_at) >> >> >> foo_id is an integer. Some queries involve all three columns in >> their WHERE clauses, some involve only foo_id. >> Would it be ok from general performance standpoint to remove >> foo_index and rely only on multi_index? I know that >> PG would have to do less work updating just one index compared to >> updating them both, but wouldn't searches >> on foo_id alone become slower? > > it depends . > > it depends on the queries you are using, on your workload. a > multi-column-index will be large than an index over just one column, > therefore you will have more disk-io when you read from such an index. I think it also depends on the average number of rows having the same foo_id.
On 4/18/19 8:45 AM, Gavin Flower wrote: > On 19/04/2019 01:24, Ron wrote: >> On 4/18/19 2:14 AM, Andreas Kretschmer wrote: [snip] >> >> (Prefix compression would obviate the need for this question. Then your >> multi-column index would be *much* smaller.) > > True, but a multi column index will still be bigger than single column index. TANSTAAFL. -- Angular momentum makes the world go 'round.
On 19/04/2019 02:12, Ron wrote: > On 4/18/19 8:45 AM, Gavin Flower wrote: >> On 19/04/2019 01:24, Ron wrote: >>> On 4/18/19 2:14 AM, Andreas Kretschmer wrote: > [snip] >>> >>> (Prefix compression would obviate the need for this question. Then >>> your multi-column index would be *much* smaller.) >> >> True, but a multi column index will still be bigger than single >> column index. > > TANSTAAFL. > > QUOTE: [Oh, 'tanstaafl.' Means ~There ain't no such thing as a free lunch.'] From The Moon is a Harsh Mistress, by Robert Heinlein. Published 1966 Is where I first came across TANSTAAFL. However, it appears to have been used at least as early as 1949. Just adding this, as probably there are many people who don't know the acronym.
On 19/04/2019 01:47, Harald Fuchs wrote: > Andreas Kretschmer <andreas@a-kretschmer.de> writes: > >> Am 18.04.19 um 08:52 schrieb rihad: >>> Hi. Say there are 2 indexes: >>> >>> "foo_index" btree (foo_id) >>> >>> "multi_index" btree (foo_id, approved, expires_at) >>> >>> >>> foo_id is an integer. Some queries involve all three columns in >>> their WHERE clauses, some involve only foo_id. >>> Would it be ok from general performance standpoint to remove >>> foo_index and rely only on multi_index? I know that >>> PG would have to do less work updating just one index compared to >>> updating them both, but wouldn't searches >>> on foo_id alone become slower? >> it depends . >> >> it depends on the queries you are using, on your workload. a >> multi-column-index will be large than an index over just one column, >> therefore you will have more disk-io when you read from such an index. > I think it also depends on the average number of rows having the same foo_id. > The number of rows referenced by an index entry for the multi_index will always be less than or equal to those for the matching foo_index. Also there will be fewer index entries per block for the multi_index. Which is why the I/O count will be higher; even in the best case, where there is an equal row referenced by the index entries.
On 19/04/2019 01:47, Harald Fuchs wrote: > Andreas Kretschmer<andreas@a-kretschmer.de> writes: > >> Am 18.04.19 um 08:52 schrieb rihad: >>> Hi. Say there are 2 indexes: >>> >>> "foo_index" btree (foo_id) >>> >>> "multi_index" btree (foo_id, approved, expires_at) >>> >>> >>> foo_id is an integer. Some queries involve all three columns in >>> their WHERE clauses, some involve only foo_id. >>> Would it be ok from general performance standpoint to remove >>> foo_index and rely only on multi_index? I know that >>> PG would have to do less work updating just one index compared to >>> updating them both, but wouldn't searches >>> on foo_id alone become slower? >> it depends . >> >> it depends on the queries you are using, on your workload. a >> multi-column-index will be large than an index over just one column, >> therefore you will have more disk-io when you read from such an index. > I think it also depends on the average number of rows having the same foo_id. > The number of rows referenced by an index entry for the multi_index will always be less than or equal to those for the matching foo_index. Also there will be fewer index entries per block for the multi_index, which is why the I/O count will be higher even in the best case where there is an equal number of rows referenced by each index entry.
On 19/04/2019 14:01, Gavin Flower wrote: [...] > > Also there will be fewer index entries per block for the multi_index, > which is why the I/O count will be higher even in the best case where > there is an equal number of rows referenced by each index entry. > Not sure why my system had this still in my draft folder! Sorry, for the duplication...