Обсуждение: Why scan all columns when we select distinct c1?
PostgreSQL version: 16.1
Operating system: centos7
Description:
Let me show these explain results first, in PG9.4 and PG16.1.
### Behavior in PG9.4
``` SQL
gpadmin=# create table t1 (c1 int, c2 text);
CREATE TABLE
gpadmin=# explain (costs off, verbose) select distinct c1 from t1;
QUERY PLAN
-----------------------------
HashAggregate
Output: c1
Group Key: t1.c1
-> Seq Scan on public.t1
Output: c1 <---- pay attention <---- !!!
(5 rows)
```
### Behavior in PG 16.1
``` SQL
gpadmin=# create table t1 (c1 int, c2 text);
CREATE TABLE
gpadmin=# explain (costs off, verbose) select distinct c1 from t1;
QUERY PLAN
-----------------------------
HashAggregate
Output: c1
Group Key: t1.c1
-> Seq Scan on public.t1
Output: c1, c2 <---- pay attention <---- !!!
(5 rows)
```
My question is why scan all columns in PG 16.01?
If `select distinct c1`, scan the column `c1` is enough, like PG 9.4.
Related GPDB issue link: https://github.com/greenplum-db/gpdb/issues/15266
Reporter: David Kimura and Yongtao Huang
Operating system: centos7
Description:
Let me show these explain results first, in PG9.4 and PG16.1.
### Behavior in PG9.4
``` SQL
gpadmin=# create table t1 (c1 int, c2 text);
CREATE TABLE
gpadmin=# explain (costs off, verbose) select distinct c1 from t1;
QUERY PLAN
-----------------------------
HashAggregate
Output: c1
Group Key: t1.c1
-> Seq Scan on public.t1
Output: c1 <---- pay attention <---- !!!
(5 rows)
```
### Behavior in PG 16.1
``` SQL
gpadmin=# create table t1 (c1 int, c2 text);
CREATE TABLE
gpadmin=# explain (costs off, verbose) select distinct c1 from t1;
QUERY PLAN
-----------------------------
HashAggregate
Output: c1
Group Key: t1.c1
-> Seq Scan on public.t1
Output: c1, c2 <---- pay attention <---- !!!
(5 rows)
```
My question is why scan all columns in PG 16.01?
If `select distinct c1`, scan the column `c1` is enough, like PG 9.4.
Related GPDB issue link: https://github.com/greenplum-db/gpdb/issues/15266
Reporter: David Kimura and Yongtao Huang
On Sun, Jan 14, 2024 at 6:18 AM Yongtao Huang <yongtaoh2022@gmail.com> wrote:
PostgreSQL version: 16.1
Operating system: centos7
Description:
Let me show these explain results first, in PG9.4 and PG16.1.
### Behavior in PG9.4
``` SQL
gpadmin=# create table t1 (c1 int, c2 text);
CREATE TABLE
gpadmin=# explain (costs off, verbose) select distinct c1 from t1;
QUERY PLAN
-----------------------------
HashAggregate
Output: c1
Group Key: t1.c1
-> Seq Scan on public.t1
Output: c1 <---- pay attention <---- !!!
(5 rows)
```
### Behavior in PG 16.1
``` SQL
gpadmin=# create table t1 (c1 int, c2 text);
CREATE TABLE
gpadmin=# explain (costs off, verbose) select distinct c1 from t1;
QUERY PLAN
-----------------------------
HashAggregate
Output: c1
Group Key: t1.c1
-> Seq Scan on public.t1
Output: c1, c2 <---- pay attention <---- !!!
(5 rows)
```
My question is why scan all columns in PG 16.01?
If `select distinct c1`, scan the column `c1` is enough, like PG 9.4.
You can't scan just one column of a row-oriented table.
The real question is why it mentions c2.
Am 14.01.24 um 12:17 schrieb Yongtao Huang: > > > My question is why scan all columns in PG 16.01? > If `select distinct c1`, scan the column `c1` is enough, like PG 9.4. good question, I think because the seq scan always reads the column. If you create an index on c1, this changes: postgres=# create index idx1 on t1(c1); CREATE INDEX postgres=# set enable_seqscan to off; SET postgres=# explain (costs, verbose) select distinct c1 from t1; QUERY PLAN ------------------------------------------------------------------------------------- Unique (cost=0.15..63.93 rows=200 width=4) Output: c1 -> Index Only Scan using idx1 on public.t1 (cost=0.15..61.10 rows=1130 width=4) Output: c1 (4 rows) now we scan only the index and not the heap. Regards, Andreas -- Andreas Kretschmer CYBERTEC PostgreSQL Services and Support
Ron Johnson <ronljohnsonjr@gmail.com> writes: > On Sun, Jan 14, 2024 at 6:18 AM Yongtao Huang <yongtaoh2022@gmail.com> > wrote: >> gpadmin=# create table t1 (c1 int, c2 text); >> CREATE TABLE >> gpadmin=# explain (costs off, verbose) select distinct c1 from t1; >> QUERY PLAN >> ----------------------------- >> HashAggregate >> Output: c1 >> Group Key: t1.c1 >> -> Seq Scan on public.t1 >> Output: c1, c2 <---- pay attention <---- !!! >> (5 rows) >> >> My question is why scan all columns in PG 16.01? > You can't scan just one column of a row-oriented table. > The real question is why it mentions c2. The planner did that so that the SeqScan step doesn't have to perform a projection: it can just return (a pointer to) the physical tuple it found in the table, without doing extra work to form a tuple containing only c1. The upper HashAgg step won't really care. See use_physical_tlist() in createplan.c. What I'm confused about is why 9.4 didn't do the same. That optimization heuristic is very old, and certainly would be applied by 9.4 in some circumstances. Testing says the behavior in this specific case changed at 9.6. I'm not quite interested enough to drill down further... regards, tom lane
Thanks for the explanation.
Tom Lane <tgl@sss.pgh.pa.us> 于2024年1月14日周日 23:46写道:
Ron Johnson <ronljohnsonjr@gmail.com> writes:
> On Sun, Jan 14, 2024 at 6:18 AM Yongtao Huang <yongtaoh2022@gmail.com>
> wrote:
>> gpadmin=# create table t1 (c1 int, c2 text);
>> CREATE TABLE
>> gpadmin=# explain (costs off, verbose) select distinct c1 from t1;
>> QUERY PLAN
>> -----------------------------
>> HashAggregate
>> Output: c1
>> Group Key: t1.c1
>> -> Seq Scan on public.t1
>> Output: c1, c2 <---- pay attention <---- !!!
>> (5 rows)
>>
>> My question is why scan all columns in PG 16.01?
> You can't scan just one column of a row-oriented table.
> The real question is why it mentions c2.
The planner did that so that the SeqScan step doesn't have to
perform a projection: it can just return (a pointer to)
the physical tuple it found in the table, without doing extra
work to form a tuple containing only c1. The upper HashAgg
step won't really care. See use_physical_tlist() in createplan.c.
What I'm confused about is why 9.4 didn't do the same.
That optimization heuristic is very old, and certainly
would be applied by 9.4 in some circumstances. Testing
says the behavior in this specific case changed at 9.6.
I'm not quite interested enough to drill down further...
regards, tom lane
On 1/14/24 9:46 AM, Tom Lane wrote: > Ron Johnson<ronljohnsonjr@gmail.com> writes: >> You can't scan just one column of a row-oriented table. Technically you kinda can, depending on your definition of "scan". >> The real question is why it mentions c2. > The planner did that so that the SeqScan step doesn't have to > perform a projection: it can just return (a pointer to) > the physical tuple it found in the table, without doing extra > work to form a tuple containing only c1. The piece of info that's not mentioned here is how tuples (rows) are actually processed to extract individual datums (columns). The full details are in heap_deform_tuple() in backend/access/common/heaptuple.c, but the general gist is that (ignoring nulls) to read a tuple the code has to go datum by datum, computing the size of each datum to determine the physical location of the *next* datum. So if you want the 3rd datum in a tuple, you need to calculate the size of the 1st datum to see where the 2nd datum lives, and then compute the size of the 2nd datum to see where the 3rd one lives. In this example, if c1 is literally the first column in the table, then heap_deform_tuple is free to ignore everything else in the tuple, so long as the code calling heap_deform_tuple() knows to ask for only 1 datum. If all that sounds kinda expensive and tedious: you're right, it is, and it's why deforming tuples is generally done as late as possible. Based on what Tom's saying, in 9.6+ the HashAggregate code would be calling heap_deform_tuple(), and I'd expect it to only be retrieving c1. Without going through all the code, I think what's happening in 9.4 is the projection ends up calling heap_deform_tuple instead of the HashAgg code. It's still only grabbing c1; it's just doing it sooner rather than later. In this particular case I don't think it'd make much difference, but in more complicated queries it could certainly have a noticable effect. Either way, the explain output is kinda confusing. IMO it'd be more accurate if it said something like "Output: t1". And Ron's main point that you're going to be reading an entire row of t1 from the OS is also true. BTW, there's another place where the code waits as long as possible to access actual data in the hopes of avoiding needless work and that's values that have been TOASTed. heap_deform_tuple() doesn't actually need to de-toast data, so it will simply return a Datum that is a "pointer" (not a C pointer) to the toasted data. That will only be detoasted if something actually needs the actual data. In some cases that be a big performance win. -- Jim Nasby, Data Architect, Austin TX