Re: [GENERAL] DISTINCT vs GROUP BY - was Re: is (not) distinct from

Поиск

Список

Период

Сортировка

От	Sven R. Kunze
Тема	Re: [GENERAL] DISTINCT vs GROUP BY - was Re: is (not) distinct from
Дата	3 марта 2017 г. 12:51:16
Msg-id	b36b9115-8e46-16b7-4b4f-06186cda52a2@mail.de обсуждение исходный текст
Ответ на	[GENERAL] DISTINCT vs GROUP BY - was Re: is (not) distinct from (George Neuner <gneuner2@comcast.net>)
Список	pgsql-general

Дерево обсуждения

On 03.03.2017 06:26, George Neuner wrote:

I know most people here don't pay much - or any - attention to
SQLServer, however there was an interesting article recently regarding
significant performance differences between DISTINCT and GROUP BY as
used to remove duplicates.

https://sqlperformance.com/2017/01/t-sql-queries/surprises-assumptions-group-by-distinct

On a similar note, this is also an interesting read about the topic of distinct vs group by: https://blogs.oracle.com/developer/entry/counting_with_oracle_is_faster Interesting is the performance difference between integers and strings for PostgreSQL which doesn't exist for Oracle.I also tried rewriting "select distinct" to "select group by" using PostgreSQL. It didn't help; it was even worse (see appendix).

I'll get around to doing some testing soon.  For now, I am just asking
if anyone has ever run into something like this?

Yes, my team did. We use Django on a daily basis to generate SQL queries. In case of model-spanning queries, a lot of joining and duplications are involved. Distinct is the "generally" accepted way to remedy the situation but it's actually more like Tom said: distinct is a band-aid here. UNIONS and SUBSELECTs would be better I guess.Sven** Appendix **>>>># \d docs Table "public.docs" Column | Type | Modifiers --------+---------+--------------------------------------------------- id | integer | not null default nextval('docs_id_seq'::regclass) meta | jsonb | Indexes: "docs_pkey" PRIMARY KEY, btree (id)>>>># explain analyze select count(distinct meta->>'blood_group') from docs; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=760497.00..760497.01 rows=1 width=449) (actual time=37631.727..37631.727 rows=1 loops=1) -> Seq Scan on docs (cost=0.00..710497.00 rows=10000000 width=449) (actual time=0.500..3999.417 rows=10000000 loops=1) Planning time: 0.211 ms Execution time: 37631.829 ms(4 rows)>>>># explain analyze select count(*) from (select meta->>'blood_group' from docs group by meta->>'blood_group') as x; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=4441923.83..4441923.84 rows=1 width=0) (actual time=41189.472..41189.472 rows=1 loops=1) -> Group (cost=4241923.83..4316923.83 rows=10000000 width=449) (actual time=31303.690..41189.455 rows=8 loops=1) Group Key: ((docs.meta ->> 'blood_group'::text)) -> Sort (cost=4241923.83..4266923.83 rows=10000000 width=449) (actual time=31303.686..40475.227 rows=10000000 loops=1) Sort Key: ((docs.meta ->> 'blood_group'::text)) Sort Method: external merge Disk: 129328kB -> Seq Scan on docs (cost=0.00..735497.00 rows=10000000 width=449) (actual time=0.349..6433.691 rows=10000000 loops=1) Planning time: 2.189 ms Execution time: 41203.669 ms(9 rows)

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [GENERAL] DISTINCT vs GROUP BY - was Re: is (not) distinct from