Question about use_physical_tlist() which is applied on Scan path

Поиск

Список

Период

Сортировка

От	Jian Guo
Тема	Question about use_physical_tlist() which is applied on Scan path
Дата	26 июля 2023 г. 09:40:28
Msg-id	SN6PR05MB51999AC1D043689076188370C400A@SN6PR05MB5199.namprd05.prod.outlook.com обсуждение исходный текст
Ответы	Re: Question about use_physical_tlist() which is applied on Scan path
Список	pgsql-hackers

Дерево обсуждения

Hi hackers,

I have a question about `use_physical_tlist()` which is applied in `create_scan_plan()`:

```

if (flags == CP_IGNORE_TLIST)

{

tlist = NULL;

}

else if (use_physical_tlist(root, best_path, flags))

{

if (best_path->pathtype == T_IndexOnlyScan)

{

/* For index-only scan, the preferred tlist is the index's */

tlist = copyObject(((IndexPath *) best_path)->indexinfo->indextlist);

* Transfer sortgroupref data to the replacement tlist, if

* requested (use_physical_tlist checked that this will work).

if (flags & CP_LABEL_TLIST)

apply_pathtarget_labeling_to_tlist(tlist, best_path->pathtarget);

}

else

{

tlist = build_physical_tlist(root, rel);

……

```

And the comment above the code block says:

```

* For table scans, rather than using the relation targetlist (which is

* only those Vars actually needed by the query), we prefer to generate a

* tlist containing all Vars in order. This will allow the executor to

* optimize away projection of the table tuples, if possible.

* But if the caller is going to ignore our tlist anyway, then don't

* bother generating one at all. We use an exact equality test here, so

* that this only applies when CP_IGNORE_TLIST is the only flag set.

```

But for some column-oriented database based on Postgres, it may help a lot in case of projection of the table tuples in execution? And is there any other optimization considerations behind this design?

e.g. If we have such table definition and a query:

```

CREATE TABLE partsupp

(PS_PARTKEY INT,

PS_SUPPKEY INT,

PS_AVAILQTY INTEGER,

PS_SUPPLYCOST DECIMAL(15,2),

PS_COMMENT VARCHAR(199),

dummy text);

explain analyze verbose select sum(ps_supplycost * ps_availqty) from partsupp;

```

And the planner would give such plan:

```

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------

Aggregate (cost=12.80..12.81 rows=1 width=32) (actual time=0.013..0.015 rows=1 loops=1)

Output: sum((ps_supplycost * (ps_availqty)::numeric))

-> Seq Scan on public.partsupp (cost=0.00..11.60 rows=160 width=22) (actual time=0.005..0.005 rows=0 loops=1)

Output: ps_partkey, ps_suppkey, ps_availqty, ps_supplycost, ps_comment, dummy

Planning Time: 0.408 ms

Execution Time: 0.058 ms

(6 rows)

```

It looks the columns besides `ps_supplycost` and `ps_availqty` are not necessary, but fetched from tuples all at once. For the row-based storage such as heap, it looks fine, but for column-based storage, it would result into unnecessary overhead and impact performance. Is there any plan to optimize here?

Thanks.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Amit Langote
Дата: 26 июля 2023 г., 09:17:06
Сообщение: Re: [feature]COPY FROM enable FORCE_NULL/FORCE_NOT_NULL on all columns

Следующее

От: Zhang Mingli
Дата: 26 июля 2023 г., 10:03:01
Сообщение: Re: [feature]COPY FROM enable FORCE_NULL/FORCE_NOT_NULL on all columns

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Question about use_physical_tlist() which is applied on Scan path

Предыдущее

Следующее