Re: DISTINCT with btree skip scan

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: DISTINCT with btree skip scan
Дата
Msg-id CAApHDvow8-DokZExZwHB_rJ9euT9NGgxX7+Zuy0AKHpcueF1Qg@mail.gmail.com
обсуждение исходный текст
Ответ на DISTINCT with btree skip scan  (Thomas Munro <munro@ip9.org>)
Ответы Re: DISTINCT with btree skip scan
Список pgsql-hackers


On Sat, Jul 5, 2014 at 12:17 PM, Thomas Munro <munro@ip9.org> wrote:
postgres=# set enable_hashagg = false;
SET
Time: 0.302 ms
postgres=# explain select distinct a from foo;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                             QUERY PLAN                                              │
├─────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan for distinct prefix 1 using foo_pkey on foo  (cost=0.43..263354.20 rows=10 width=4) │
│ Planning time: 0.063 ms                                                                             │
└─────────────────────────────────────────────────────────────────────────────────────────────────────┘
(2 rows)

Time: 0.443 ms
postgres=# select distinct a from foo;
┌───┐
│ a │
├───┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
└───┘
(10 rows)

Time: 0.565 ms



Hi Thomas,

I've had a quick look at this and it seems like a great win! I'm quite surprised that we've not got this already. I think this technology could also really help performance of queries such as SELECT * from bigtable bt WHERE EXISTS(SELECT 1 FROM otherbigtable obt WHERE bt.somecol = obt.someIndexedButNonUniqueColumn); I know you're not proposing to improve that first off, but it could be done later once the benefits of this are more commonly realised.

I think our shortfalls in this area have not gone unnoticed. I was reading this post https://www.periscope.io/blog/count-distinct-in-mysql-postgres-sql-server-and-oracle.html about comparing performance of COUNT(DISTINCT col). I think this would give a big win for query 3 in that post. I'm trying to find some time make some changes to transform queries to allow the group by to happen before the joins when possible, so between that and your patch we'd be 2 steps closer to making query 1 in the link above perform a little better on PostgreSQL.

Do you think you'll manage to get time to look at this a bit more? I'd be keen to look into the costing side of it if you think that'll help you any?

Regards

David Rowley

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ali Akbar
Дата:
Сообщение: Re: Function array_agg(array)
Следующее
От: "Amit Langote"
Дата:
Сообщение: Re: On partitioning