Re: [HACKERS] Removing useless DISTINCT clauses
От | Jeff Janes |
---|---|
Тема | Re: [HACKERS] Removing useless DISTINCT clauses |
Дата | |
Msg-id | CAMkU=1xg6_jX7dXdN68fx=choBKC9Ds+WM9mRSQu+TsHsy0mLA@mail.gmail.com обсуждение исходный текст |
Ответ на | [HACKERS] Removing useless DISTINCT clauses (David Rowley <david.rowley@2ndquadrant.com>) |
Ответы |
Re: [HACKERS] Removing useless DISTINCT clauses
|
Список | pgsql-hackers |
On Mon, Nov 6, 2017 at 1:16 AM, David Rowley <david.rowley@2ndquadrant.com> wrote:
Cheers,
In [1] we made a change to process the GROUP BY clause to remove any
group by items that are functionally dependent on some other GROUP BY
items.
This really just checks if a table's PK columns are entirely present
in the GROUP BY clause and removes anything else belonging to that
table.
All this seems to work well, but I totally failed to consider that the
exact same thing applies to DISTINCT too.
Over in [2], Rui Liu mentions that the planner could do a better job
for his case.
Using Rui Liu's example:
CREATE TABLE test_tbl ( k INT PRIMARY KEY, col text);
INSERT into test_tbl select generate_series(1,10000000), 'test';
Master:
postgres=# explain analyze verbose select distinct col, k from
test_tbl order by k limit 1000;
QUERY PLAN
------------------------------------------------------------ ------------------------------ ------------------------------ -------------------------
Limit (cost=1658556.19..1658563.69 rows=1000 width=9) (actual
time=8934.962..8935.495 rows=1000 loops=1)
Output: col, k
-> Unique (cost=1658556.19..1733557.50 rows=10000175 width=9)
(actual time=8934.961..8935.460 rows=1000 loops=1)
Output: col, k
-> Sort (cost=1658556.19..1683556.63 rows=10000175 width=9)
(actual time=8934.959..8935.149 rows=1000 loops=1)
Output: col, k
Sort Key: test_tbl.k, test_tbl.col
Sort Method: external merge Disk: 215128kB
-> Seq Scan on public.test_tbl (cost=0.00..154056.75
rows=10000175 width=9) (actual time=0.062..1901.728 rows=10000000
loops=1)
Output: col, k
Planning time: 0.092 ms
Execution time: 8958.687 ms
(12 rows)
Patched:
postgres=# explain analyze verbose select distinct col, k from
test_tbl order by k limit 1000;
QUERY PLAN
------------------------------------------------------------ ------------------------------ ------------------------------ ------------------------------ ----
Limit (cost=0.44..34.31 rows=1000 width=9) (actual time=0.030..0.895
rows=1000 loops=1)
Output: col, k
-> Unique (cost=0.44..338745.50 rows=10000175 width=9) (actual
time=0.029..0.814 rows=1000 loops=1)
Output: col, k
-> Index Scan using test_tbl_pkey on public.test_tbl
(cost=0.44..313745.06 rows=10000175 width=9) (actual time=0.026..0.452
rows=1000 loops=1)
Output: col, k
Planning time: 0.152 ms
Execution time: 0.985 ms
(8 rows)
A patch to implement this is attached.
Couldn't the Unique node be removed entirely? If k is a primary key, you can't have duplicates in need of removal.
Or would that be a subject for a different patch?
I think remove_functionally_dependant_groupclauses should have a more generic name, like remove_functionally_dependant_clauses.
Jeff
В списке pgsql-hackers по дате отправления:
Следующее
От: Peter EisentrautДата:
Сообщение: Re: [HACKERS] Proposal: Local indexes for partitioned table