Re: Extracting only the columns needed for a query

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: Extracting only the columns needed for a query
Дата
Msg-id CAAKRu_Z355CurEzBJ1Mq3RS_dYqx3x0eUHhoOsQa+RQwHrje-w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Extracting only the columns needed for a query  (Dmitry Dolgov <9erthalion6@gmail.com>)
Ответы Re: Extracting only the columns needed for a query
Re: Extracting only the columns needed for a query
Список pgsql-hackers


On Tue, Dec 17, 2019 at 2:57 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Thanks for the patch! If I understand correctly from this thread,
approach B is more preferable, so I've concentrated on the patch 0001
and have a few commentaries/questions:

Thanks so much for the review!
 

* The idea is to collect columns that have being used for selects/updates
  (where it makes sense for columnar storage to avoid extra work), do I see it
  right? If it's the case, then scanCols could be a bit misleading, since it
  gives an impression that it's only about reads.

The "scanCols" columns are only what will need to be scanned in order
to execute a query, so, even if a column is being "used", it may not
be set in "scanCols" if it is not required to scan it.

For example, a column which does not need to be scanned but is "used"
-- e.g. in UPDATE x SET col = 2; "col" will not be in "scanCols" because
it is known that it will be 2.

That makes me think that maybe the function name,
extract_used_columns() is bad, though. Maybe extract_scan_columns()?
I tried this in the attached, updated patch.
 

* After a quick experiment, it seems that extract_used_columns is invoked for
  updates, but collects too many colums, e.g:

    create table test (id int primary key, a text, b text, c text);
    update test set a = 'something' where id = 1;

  collects into scanCols all columns (varattno from 1 to 4) + again the first
  column from baserestrictinfo. Is it correct?

For UPDATE, we need all of the columns in the table because of the
table_lock() API's current expectation that the slot has all of the
columns populated. If we want UPDATE to only need to insert the column
values which have changed, table_tuple_lock() will have to change.

Collecting columns from the baserestrictinfo is important for when
that column isn't present in another part of the query, but it won't
double count it in the bitmap (when it is already present).
 

* Not sure if it supposed to be covered by this functionality, but if we do

    insert ... on conflict (uniq_col) do update set other_col = 'something'

  and actually have to perform an update, extract_used_columns is not called.


For UPSERT, you are correct that it will not extract scan columns.
It wasn't by design. It is because that UPDATE is planned as part of
an INSERT.
For an INSERT, in query_planner(), because the jointree has only one
relation and that relation is an RTE_RESULT planner does not continue
on to make_one_rel() and thus doesn't extract scan columns.

This means that for INSERT ON CONFLICT DO UPDATE, "scanCols" is not
populated, however, since UPDATE needs to scan all of the columns
anyway, I don't think populating "scanCols" would have any impact.
This does mean that that bitmap would be different for a regular
UPDATE vs an UPSERT, however, I don't think that doing the extra work
to populate it makes sense if it won't be used. What do you think?
 
* Probably it also makes sense to check IS_DUMMY_REL in extract_used_columns?


I am wondering, when IS_DUMMY_REL is true for a relation, do we
reference the associated RTE later? It seems like if it is a dummy
rel, we wouldn't scan it. It still makes sense to add it to
extract_used_columns(), I think, to avoid any wasted loops through the
rel's expressions. Thanks for the idea!

--
Melanie Plageman
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: John Naylor
Дата:
Сообщение: Re: benchmarking Flex practices
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Removal of support for OpenSSL 0.9.8 and 1.0.0