JP 的个人资料Wireless Earth, wireless...照片日志列表 工具 帮助
7月16日

Free OCR software? You may already have it

Microsoft Office Document Imaging

On accident, I stumbled across Microsoft Office Document Imaging. It's included Microsoft Office Tools ("Microsoft Office \ Microsoft Office Tools" folder in the start menu, default installation location is "C:\Program Files\Common Files\Microsoft Shared\MODI\11.0\"). The interface looks a "My First VB5 Application" reject, but it works great.

It handles scanned documents via TWAIN. The image import's a bit lame - it only handles TIF files. You can convert to TIF in just about any graphics application (e.g. MSPAINT - open the file, Save As TIF file). An easier method is to just copy the image to the clipboard and paste as a new page into MODI.

Here's a quick walk through of how I grabbed some text from a PDF2.

Step 1. I selected the text I wanted to OCR with Cropper (output set to Clipboard)

Step 2. I opened Microsoft Office Document Imaging and loaded my image with Page / Paste Page

Step 3. I ran the OCR process by clicking on the "funky eye" toolbar button (or in the Tools menu)

Step 4. Click the Export to Word toolbar button
Step 5. Copy the text and paste it where you want it

In this case, it was an e-mail. I've done the same thing to grab SQL or C# code which I then paste into the editor and compile (Ctrl-F5 for SQL, Ctrl-Shift-B for C#) to catch the things that didn't make it through the OCR cleanly.

I haven't tried it, but apparently you can automate MODI from .NET.