Re: [Qgis-user] OCR
От | Mike Ellsworth |
---|---|
Тема | Re: [Qgis-user] OCR |
Дата | |
Msg-id | BANLkTinrieWobY61d1nSjsZfUXJROye_Fg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [Qgis-user] OCR (Ramon Andinach <custard@westnet.com.au>) |
Список | pgsql-novice |
On Wed, May 4, 2011 at 6:04 AM, Ramon Andinach <custard@westnet.com.au> wrote: > > On 04/05/2011, at 17:18 , ALT SHN wrote: > >> Hello, >> >> This might seem a Little off topic, but maybe someone here can help me. >> >> I need to extract toponomical data from old digitized paper maps. I wish to explore Optical character recognition (OCR). >> >> Does anyone has a suggestion/experience with this kind of challenge? >> >> Thank you, >> >> André Mano > > I'd argue about the "little". I've spent a lot of the last few weeks arguing with OCR software about tables of sample datafrom old reports, that become point data to plot. Perfectly relevant :) > > I've never tried getting data from digitized maps, but I'll offer the following generalisations in case it helps. Generally,I have OCR programmes pass me the results as plain text. I lose the formatting, but I don't have to fix stupidguesses about the formatting. > > This is from my experience, so you may find different. > > 1. OCR loves paragraphs. > 2. Different OCR programmes handle column text differently. Some understand columns, some just assume L->R straight acrossboth columns. > 3. OCR does not get along with handwritten anything. (Unless the person was extra-extra neat and consistent in their writing,and even then it's a maybe.) > 4. OCR on tabular data works best if the data is lined up in columns, and doesn't have random big gaps. > 5. OCR will almost certainly be confused if there is a line on your map running through or near a word. > 5a. Actually lines could confuse it quite a bit - I remember one that tried to recreate an in-line sketch map out of asciicharacters. Quite amusing. > 6. You *will* need to check the results. > > I'd love to hear what OCR makes of maps. Very curious. > > -ramon. > -- Going along with the poster above, I think you may have problems if any of your text is skewed or 'wraps' in any way such that any character is not horizontal to the scan. Recognizing 7 out of 10, then cleaning up, may be more time consuming than data entry. You may be able to 1) scan 2) drag a 'box' over the various data points and assign an x,y for the box 3) assign each box an identifier 4) then data enter per box. At least you'll have a loose representation of the data. I know one thing I'd google for is 'OCR skewed text' and see if there is an app that can handle that potential problem before I went much further. -- Mike Ellsworth
В списке pgsql-novice по дате отправления: