Re: COPY FROM WHEN condition

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: COPY FROM WHEN condition
Дата
Msg-id 0623219e-c034-bc10-c546-a01da35e6349@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: COPY FROM WHEN condition  (Corey Huinker <corey.huinker@gmail.com>)
Список pgsql-hackers

On 11/02/2018 03:57 AM, Corey Huinker wrote:
>     > Are you thinking something like having a COPY command that provides
>     > results in such a way that they could be referenced in a FROM clause
>     > (perhaps a COPY that defines a cursor…)?
> 
>     That would also be nice, but what I was thinking of was that some
>     highly restricted subset of cases of SQL in general could lend
>     themselves to levels of optimization that would be impractical in
>     other contexts.
> 
> 
> If COPY (or a syntactical equivalent) can return a result set, then the
> whole of SQL is available to filter and aggregate the results and we
> don't have to invent new syntax, or endure confusion whenCOPY-WHEN
> syntax behaves subtly different from a similar FROM-WHERE.
> 
> Also, what would we be saving computationally? The whole file (or
> program output) has to be consumed no matter what, the columns have to
> be parsed no matter what. At least some of the columns have to be
> converted to their assigned datatypes enough to know whether or not to
> filter the row, but we might be able push that logic inside a copy. I'm
> thinking of something like this:
> 
>     SELECT x.a, sum(x.b)
>     FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer,
>     b numeric, c text, d date, e json) )
>     WHERE x.d >= '2018-11-01'
> 
> 
> In this case, there is the /opportunity/ to see the following optimizations:
> - columns c and e are never referenced, and need never be turned into a
> datum (though we might do so just to confirm that they conform to the
> data type)
> - if column d is converted first, we can filter on it and avoid
> converting columns a,b
> - whatever optimizations we can infer from knowing that the two
> surviving columns will go directly into an aggregate
> 
> If we go this route, we can train the planner to notice other
> optimizations and add those mechanisms at that time, and then existing
> code gets faster.
> 
> If we go the COPY-WHEN route, then we have to make up new syntax for
> every possible future optimization.

IMHO those two things address vastly different use-cases. The COPY WHEN
case deals with filtering data while importing them into a database,
while what you're describing seems to be more about querying data stored
in a CSV file. But we already do have a solution for that - FDW, and I'd
say it's working pretty well. And AFAIK it does give you tools to
implement most of what you're asking for. I don't see why should we bolt
this on top of COPY, or how is it an alternative to COPY WHEN.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: zheap: a new storage format for PostgreSQL
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: WIP Patch: Add a function that returns binary JSONB as a bytea