Re: exposing COPY API

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: exposing COPY API
Дата
Msg-id 4D4C0645.2010109@dunslane.net
обсуждение исходный текст
Ответ на Re: exposing COPY API  (Itagaki Takahiro <itagaki.takahiro@gmail.com>)
Список pgsql-hackers

On 02/04/2011 05:49 AM, Itagaki Takahiro wrote:
> Here is a demonstration to support jagged input files. It's a patch
> on the latest patch. The new added API is:
>
>    bool NextLineCopyFrom(
>          [IN] CopyState cstate,
>          [OUT] char ***fields, [OUT] int *nfields, [OUT] Oid *tupleOid)
>
> It just returns separated fields in the next line. Fortunately, I need
> no extra code for it because it is just extracted from NextCopyFrom().

Thanks, I'll have a look at it, after an emergency job I need to attend 
to. But the API looks weird. Why are fields and nfields OUT params. The 
issue isn't decomposing the line into raw fields. The code for doing 
that works fine as is, including on jagged files. See commit 
af1a614ec6d074fdea46de2e1c462f23fc7ddc6f which was done for exactly this 
purpose. The issue is taking those and composing them into the expected 
tuple.

> I'm willing to include the change into copy APIs,
> but we still have a few issues. See below.
>
> On Fri, Feb 4, 2011 at 16:53, Andrew Dunstan<andrew@dunslane.net>  wrote:
>> The problem with COPY FROM is that nobody's come up with a good syntax for
>> allowing it as a FROM target. Doing what I want via FDW neatly gets us
>> around that problem. But I'm quite OK with doing the hard work inside the
>> COPY code - that's what my working prototype does in fact.
> I think it is not only syntax issue. I found an issue that we hard to
> support FORCE_NOT_NULL option for extra fields. See FIXME in the patch.
> It is a fundamental problem to support jagged fields.

It's not a problem at all if you turn the line into a text array. That's 
exactly why we've been proposing it for this. The array has however many 
elements are on the line.

>> One thing I'd like is to to have file_fdw do something we can't do another
>> way. currently it doesn't, so it's nice but uninteresting.
> BTW, how do you determine which field is shifted in your broken CSV file?
> For example, the case you find "AB,CD,EF" for 2 columns tables.
> I could provide a raw CSV reader for jagged files, but you still have to
> cook the returned fields into a proper tuple...
>

See above. My client who deals with this situation and has been doing so 
for years treats underflowing fields as null and ignores overflowing 
fields. They would do he same if the data were delivered with a text 
array. It works very well for them.


See <https://github.com/adunstan/postgresql-dev/tree/sqlmed2> for my dev 
branch on this.


cheers

andrew




В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Kevin Grittner"
Дата:
Сообщение: Re: SSI patch version 14
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: SSI performance