Re: Finding Errors in .csv Input Data

Поиск
Список
Период
Сортировка
От Rich Shepard
Тема Re: Finding Errors in .csv Input Data
Дата
Msg-id alpine.LNX.2.00.1102230602220.17955@salmo.appl-ecosys.com
обсуждение исходный текст
Ответ на Re: Finding Errors in .csv Input Data  ("David Johnston" <polobo@yahoo.com>)
Ответы Re: Finding Errors in .csv Input Data  ("David Johnston" <polobo@yahoo.com>)
Re: Finding Errors in .csv Input Data  (Adrian Klaver <adrian.klaver@gmail.com>)
Список pgsql-general
On Tue, 22 Feb 2011, David Johnston wrote:

> The data and table structure provided do not seem to correlate.

David,

   That's the problem. However, they should match since they came from the
same .mdb file.

> Regardless, if you changed the delimiter to "|" from "," it is possible that
> you converted an embedded "," in one of the textual fields into a "|" when
> you should not have.

   This is why I worked out a processing flow that avoided this problem in
_almost_ every case. The 80 remaining rows do not appear to have this
problem. At least, it's not immediately obvious they do which is why I want
to learn how to identify where (and why) the data get out of sync with the
schema.

> I suggest opening up a testing file (one with the 80 malformed records and
> 10 to 20 good/control records) in an Excel or equivalent spreadsheet and
> import/text-to-columns using the "|" delimiter.  You will be able to
> quickly see rows with extra columns and specifically where those extras
> are originating.  Then you can decide on how to fix the problem.  I would
> suggest manually changing each incorrect "|" into a "," as a form of
> self-punishment for improper data conversion - but whatever works for you.

   Each of the 80 rows have extra columns, at least one.

   Here are the same two rows from the original .csv export and the
transformation:


68670,724,"",0,,11/27/1948,"D","N","N","","H","C","32031","087","N18","18N","E20","20E","07","","","","MD",39.44,119.77,"NV003","M","KAIPER,
RL","","","","SIERRA MANOR","","",11/15/1948,"D","",,106,,"",106,6.62,0,60,102,1,12.00,,30.00,,,"B","G","G","AIR
COMPRESSORTESTED 30 GPM ALSO","","3","MEL MEYER","RT 1 BOX
10RENO,",,3,"NV003","JSWINGHOLM",1/16/2003,"",,"F",11/11/1948,,,261013.36,4369139.23

68670|724||0||11/27/1948|D|N|N||H|C|32031|087|N18|18N|E20|20E|07||||MD|39.44|119.77|NV003|M|KAIPER, R L|||SIERRA
MANOR||11/15/1948|D|||106|||106|6.62|0|60|102|1|12.00||30.00|||B|G|G|AIRCOMPRESSOR TESTED 30 GPM ALSO||3|MEL MEYER|RT 1
BOX10 RENO|||3|NV003|JSWINGHOLM|1/16/2003|||F|11/11/1948|||261013.36|4369139.23 

   When I compared the two I saw no difference. Why the latter does not want
to fit the schema I don't know.

   I can send the schema again and the whole 80-row file to anyone willing to
hemp me find the problems.

Rich


В списке pgsql-general по дате отправления:

Предыдущее
От: Ioana Danes
Дата:
Сообщение: Re: Logged statement apparently did not commited...
Следующее
От: "David Johnston"
Дата:
Сообщение: Re: Finding Errors in .csv Input Data