Обсуждение: Question about use_physical_tlist() which is applied on Scan path

Поиск
Список
Период
Сортировка

Question about use_physical_tlist() which is applied on Scan path

От
Jian Guo
Дата:

Hi hackers,

 

I have a question about `use_physical_tlist()` which is applied in `create_scan_plan()`:

```

if (flags == CP_IGNORE_TLIST)

{

  tlist = NULL;

}

else if (use_physical_tlist(root, best_path, flags))

{

  if (best_path->pathtype == T_IndexOnlyScan)

  {

    /* For index-only scan, the preferred tlist is the index's */

    tlist = copyObject(((IndexPath *) best_path)->indexinfo->indextlist);

 

    /*

     * Transfer sortgroupref data to the replacement tlist, if

     * requested (use_physical_tlist checked that this will work).

     */

    if (flags & CP_LABEL_TLIST)

      apply_pathtarget_labeling_to_tlist(tlist, best_path->pathtarget);

  }

  else

  {

tlist = build_physical_tlist(root, rel);

……

```

And the comment above the code block says:

 

```

/*

* For table scans, rather than using the relation targetlist (which is

* only those Vars actually needed by the query), we prefer to generate a

* tlist containing all Vars in order.  This will allow the executor to

* optimize away projection of the table tuples, if possible.

*

* But if the caller is going to ignore our tlist anyway, then don't

* bother generating one at all.  We use an exact equality test here, so

* that this only applies when CP_IGNORE_TLIST is the only flag set.

*/

```

 

But for some column-oriented database based on Postgres, it may help a lot in case of projection of the table tuples in execution? And is there any other optimization considerations behind this design?

 

e.g. If we have such table definition and a query:

 

```

CREATE TABLE partsupp

(PS_PARTKEY INT,

PS_SUPPKEY INT,

PS_AVAILQTY INTEGER,

PS_SUPPLYCOST DECIMAL(15,2),

PS_COMMENT VARCHAR(199),

dummy text);

 

explain analyze verbose select sum(ps_supplycost * ps_availqty) from partsupp;

```

 

And the planner would give such plan:

 

```

                                                    QUERY PLAN

-------------------------------------------------------------------------------------------------------------------

Aggregate  (cost=12.80..12.81 rows=1 width=32) (actual time=0.013..0.015 rows=1 loops=1)

   Output: sum((ps_supplycost * (ps_availqty)::numeric))

   ->  Seq Scan on public.partsupp  (cost=0.00..11.60 rows=160 width=22) (actual time=0.005..0.005 rows=0 loops=1)

         Output: ps_partkey, ps_suppkey, ps_availqty, ps_supplycost, ps_comment, dummy

Planning Time: 0.408 ms

Execution Time: 0.058 ms

(6 rows)

```

It looks the columns besides `ps_supplycost` and `ps_availqty` are not necessary, but fetched from tuples all at once. For the row-based storage such as heap, it looks fine, but for column-based storage, it would result into unnecessary overhead and impact performance. Is there any plan to optimize here?

 

Thanks.

Re: Question about use_physical_tlist() which is applied on Scan path

От
Alvaro Herrera
Дата:
On 2023-Jul-26, Jian Guo wrote:

> It looks the columns besides `ps_supplycost` and `ps_availqty` are not
> necessary, but fetched from tuples all at once. For the row-based
> storage such as heap, it looks fine, but for column-based storage, it
> would result into unnecessary overhead and impact performance. Is
> there any plan to optimize here?

I suppose that, at some point, it is going to have to be the table AM
the one that makes the decision.  That is, use_physical_tlist would have
to involve some new flag in path->parent->amflags to determine whether
to skip using a physical tlist.  Right now, we don't have any columnar
stores, so there's no way to verify an implementation.  If you do have a
columnar store implementation, you're welcome to share it.

-- 
Álvaro Herrera                                         PostgreSQL Developer
"I am amazed at [the pgsql-sql] mailing list for the wonderful support, and
lack of hesitasion in answering a lost soul's question, I just wished the rest
of the mailing list could be like this."                               (Fotis)
               (http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)



Re: Question about use_physical_tlist() which is applied on Scan path

От
油屋
Дата:
I had the same question recently. In addition, I looked at the results of tpch which scale factor is 1 ran on postgres REL_15_STABLE and observed no performance improvement from physical tlist. To be specific, I run two versions of tpch, one with physical tlist enabled and one with physical tlist disabled. The performance improvement of some queries in the former was less than 5%, some queries performed worse than latter, and I think this is a normal range of performance fluctuations which was not caused by physical tlist. I have read the relevant commits, maybe because of my carelessness, I did not find the test queries corresponding to physical tlist. Are there any test queries of physical tlist?
Thanks

Alvaro Herrera <alvherre@alvh.no-ip.org> 于2023年7月26日周三 18:16写道:
On 2023-Jul-26, Jian Guo wrote:

> It looks the columns besides `ps_supplycost` and `ps_availqty` are not
> necessary, but fetched from tuples all at once. For the row-based
> storage such as heap, it looks fine, but for column-based storage, it
> would result into unnecessary overhead and impact performance. Is
> there any plan to optimize here?

I suppose that, at some point, it is going to have to be the table AM
the one that makes the decision.  That is, use_physical_tlist would have
to involve some new flag in path->parent->amflags to determine whether
to skip using a physical tlist.  Right now, we don't have any columnar
stores, so there's no way to verify an implementation.  If you do have a
columnar store implementation, you're welcome to share it.

--
Álvaro Herrera                                         PostgreSQL Developer
"I am amazed at [the pgsql-sql] mailing list for the wonderful support, and
lack of hesitasion in answering a lost soul's question, I just wished the rest
of the mailing list could be like this."                               (Fotis)
               (http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)