OCR using OmniPage Pro
Optical Character Recognition (OCR) allows you to take type-written
or printed material and, via the scanner, import it as text into the computer.
The text can then be used for a variety of purposes, such as word processing
and publishing applications. Given the recent popularity of converting
printed materials to on-line publications, OCR can prevent the need for
excessive amounts of re-typing. OmniPage only works for languages which use a Roman script, such as English, French, German, Italian, Latin and Spanish. If you want to perform OCR on Russian, you'll need to use MacTiger. We do not currently own any OCR software for Chinese or Japanese.
Using OmniPage Pro (on Mellon One in the FDS) for OCR:
1. Turn on the scanner. Remember that for regular originals it is only
necessary to turn on the switch on the base of the scanner; the switch
on the lid is only used to provide backlighting for transparencies.
2. Start OmniPage Pro. It can be opened from the "Applications"
submenu under the Apple menu on Mellon One. When the program opens, a toolbar
should appear (it is recognizable by a button on the left with the word
"AUTO" written in blue). If it doesn't, you can turn it on but
selecting "Show Toolbar" from the "Window" menu.
3. There are three parts to performing OCR, which can be completed either
manually (one at a time) or as an automated process. These parts are:
- acquiring the image - that is, scanning the original document into
the comptuer as a graphic
- identifying zones - Zones tell the computer the order in which the
text falls on the page. A typical order for a page in which the text is
printed in two columns would be:
- book title or other header in the top margin
- title of the document
- left column
- right column
Defining zones properly is especially important for a document in which
the text is frequently broken up by photos and other items.
OCR - Once zones are defined, the computer now takes the scanned image
and examines it and converts recognizable characters into text.
4. The easiest way to peform OCR is having the computer complete these
three steps automatically. If you try it using the "AUTO" feature
but find that the decisions the computer makes aren't the ones most appropriate
for your purposes, you can see "fine-tuning and manual adjustment"
To automatically peform OCR:
- Place the document in the scanner, lining the top right corner up against
the green & white arrow as indicated.
- Double check the settings in the toolbar - the three buttons to the
right of the "AUTO" button correspond to the three steps outlined
in number 3, above. They should read "Scan Image," "Auto
Zones," and "Perform OCR" respectively. If they do not,
click on the down-arrows below the buttons and select the appropriate settings
from the drop-down menu.
- To facilitate OCR, select the language in which your original is written.
Do this by choosing "Select Lanuages" from the "Settings"
menu. To select multiple languages (for example, if your document contains
English and French text), hold down the command (apple) key when clicking.
When the appropriate languages have been selected, click "OK."
- Click the "AUTO" button on the toolbar. OmniPage will now
execute the three steps explained above one after the other.
- The results of the OCR will appear in two windows marked "untitled."
The window on the left shows the results of the first and second steps--that
is, the image and the zones. The window on the right is the final result--the
recognized text. If OmniPage has scanned the correct portion of the original
and established the zones in the correct order, you should continue to
step 5. Otherwise, you can make adjustments (as explained below) and repeat
5. You can now edit the next recognized by OmniPage. Some notes about
editing the text produced by OCR:
- You may find that if you did not select the proper language, diacritical
marks may have been omitted or incorrectly interpreted. Or, in some cases,
OmniPage cannot easily distinguish between similar letters, such as a lowercase
"l" and a capital "I." OmniPage's accuracy in cases
such as this one depends a great deal on the quality of the original and
on the font in which the original document was printed.
- Words not in OmniPage's dictionaries (that is, potentially misspelled
words) appear in green in the righthand window. If you wish, you can go
through this text now and make corrections or adjustments.
- If OmniPage could not read a word or a character, it identifies this
item by proceeding it with a "reject" character, which, by default,
appears as "~"--this happens frequently when lists in the original
document are set off by bullets, boxes, or other special characters.
- If you wish, you can check the OCR with a tool similar to a word processor's
spell check. The button to begin this process is at the bottom right of
the toolbar--it is a picture of a page with a blue checkmark next to it.
- If OmniPage recognized any unwanted photographs or graphics on your
original document and imported them into your text in the righthand window,
you can get rid of them now (by double-clicking on them and hitting the
"delete" key on the keyboard). If the text you are scanning will
be used for a publication or the creation of a web page, it is better to
save the text only at this time, and add graphics and photos later.
6. When you are satisfied with your text, you can save it as a file
to be opened in your word processor or other application. To do this:
- Click the "Save As" button on the toolbar.
- Change to an appropriate folder--such as your personal "User"
folder, your NetWare Home folder, or a floppy disk you've insterted.
- Type a file name for the text in the name box.
- Choose a file type from the drop-down menu. For example, if you plan
to load this text using Microsoft Word 6.0, choose that. If you are unsure,
choose "ASCII Text," since just about any application can read
this file type.
- Click "OK."
7. You can now quit OmniPage Pro (by choosing "Quit" from
the "File" menu) if you wish.
8. Once you have loaded your scanned text into your word processor,
it is a good idea to re-check the spelling and appearance of the next.
Searching your document for all instances of the "~" character
may help you catch errors or inaccuracies.
- Adjusting how OmniPage executes the three steps
- Fine-tuning settings
- Drawing your own zones
- Scanning multiple pages of a single document (choosing to make each
page a separate doc vs. compiling it all into one)