Re: COPY FROM WHEN condition

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: COPY FROM WHEN condition
Дата
Msg-id CAFj8pRCgNX9TUBD2SeoErMHs7bxb6xmxTpTFk1L_SqbEnxfinQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: COPY FROM WHEN condition  (Corey Huinker <corey.huinker@gmail.com>)
Ответы Re: COPY FROM WHEN condition  ("Daniel Verite" <daniel@manitou-mail.org>)
Список pgsql-hackers


pá 2. 11. 2018 v 3:57 odesílatel Corey Huinker <corey.huinker@gmail.com> napsal:
> Are you thinking something like having a COPY command that provides
> results in such a way that they could be referenced in a FROM clause
> (perhaps a COPY that defines a cursor…)?

That would also be nice, but what I was thinking of was that some
highly restricted subset of cases of SQL in general could lend
themselves to levels of optimization that would be impractical in
other contexts.

If COPY (or a syntactical equivalent) can return a result set, then the whole of SQL is available to filter and aggregate the results and we don't have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves subtly different from a similar FROM-WHERE.

Also, what would we be saving computationally? The whole file (or program output) has to be consumed no matter what, the columns have to be parsed no matter what. At least some of the columns have to be converted to their assigned datatypes enough to know whether or not to filter the row, but we might be able push that logic inside a copy. I'm thinking of something like this:

SELECT x.a, sum(x.b)
FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b numeric, c text, d date, e json) )
WHERE x.d >= '2018-11-01'

Without some special feature this example is not extra useful. It is based on copy on server that can use only super user with full rights.

What should be benefit of this feature?

Regards

Pavel


In this case, there is the opportunity to see the following optimizations:
- columns c and e are never referenced, and need never be turned into a datum (though we might do so just to confirm that they conform to the data type)
- if column d is converted first, we can filter on it and avoid converting columns a,b
- whatever optimizations we can infer from knowing that the two surviving columns will go directly into an aggregate

If we go this route, we can train the planner to notice other optimizations and add those mechanisms at that time, and then existing code gets faster.

If we go the COPY-WHEN route, then we have to make up new syntax for every possible future optimization.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Getting ERROR: could not open file "base/13164/t3_16388" withpartition table with ON COMMIT
Следующее
От: Erik Rijkers
Дата:
Сообщение: Re: row filtering for logical replication