• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Self-Publishing
  • Author Blogging
  • Sitemap
  • Fonts/Typography

The Book Designer

Practical advice to help build better books

by selfpublishing.com

COACHING


PUBLISHING


WRITING


PRODUCTION


FREELANCE


WRITING JOBS

  • Home
  • About
  • Articles
  • Contact
  • Shop
You are here: Home / Book Production / Capturing and Using Text from Old Documents in New Book Production

Capturing and Using Text from Old Documents in New Book Production

by Joel Friedlander on September 2, 2009 2 Comments


This post is the second in a series about the creation of a new book. To see all the articles in the series, click “The Journey of a Book” tab at the top of the page.

Well, I had a “manuscript” for my next book (see previous post here). But the old word processing files were long gone. I had to find a way to make all this potentially valuable text usable. When documents like this are scanned, the text is actually treated like a “picture” so you cannot edit the text, just place it in a document in exactly the form it was in when it was scanned. This wouldn’t work for me, so I started looking for another solution.

Heading to Kinkos

At one time there was a major market for OCR scanning software (Optical Character Recognition, “intelligent” software that interprets the letterforms in the “pictures” and makes estimations of what the word is), and I thought they must have kept developing that software. But it turns out that the increasing proportion of documents that are created on computers has lessened the need for this software. In the Macintosh space, there were only two programs available, and both were quite a bit more than I was willing to spend, since I only had the one project to convert to text.

Next I got on the phone to our local Fedex Kinko’s to see if they offer this service, but no luck. They only do scanning to PDF, which is where I had already arrived. The next two days I spent trying to find shareware or freeware OCR software and thought I would have to boot up my old Wintel box just to get this done, but it was an unappetizing prospect. Finally, I stumbed on VelOCRapter, an inexpensive program that put an “inbox” on the desktop. Just drop the files in the inbox and, after a few minutes of churning, out popped a file with the image “translated” into real text! It looked like I was in business. Only one problem. Here is what the original file and the OCR’s text looked like:

A page as it looked after being scanned to PDF
A page as it looked after being scanned to PDF
The same page after OCR scanning
The same page after OCR scanning

In some places it was hard to actually see the text because there was so much “junk” in the file from the scanning process. You can also see how the OCR software interpreted each line as a separate paragraph, so there would be thousands of “returns” to take out and tough to automate the process.

It looked like we would need some word processing “heavy lifting” before this project could really get off the ground. But, considering that I hadn’t had to write a word, I was still ahead of the game.

Filed Under: Book Production, Journey of a Book, Self-Publishing

journal marketing

Reader Interactions

Comments

  1. Joseph Gregory says

    April 26, 2011 at 9:26 pm

    Ten years ago while taking the B train back to Brooklyn after a long day at work my muse kicked in and a short, but important exchange between two characters in my story started to flow through me. Without any paper (shameful) I ripped off a piece of a paper bag I was carrying and began to write. Would you believe after all this time I came across this ripped bag with my scribbled convo on it.
    Moral of the story: Save everything you write!

    Reply
    • Joel Friedlander says

      April 26, 2011 at 10:55 pm

      Nice story, Joseph, and a valuable lesson, thanks for sharing.

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Get the Advanced Publishing Kit

Topics

  • Audiobooks
  • Author Blogging 101
  • Book Construction Blueprint
  • Book Design
  • Book Printing
  • Book Production
  • Book Reviews
  • Cameras
  • Contributing Writers
  • Cover Design
  • E-Books & Readers
  • Editorial
  • Guest Posts
  • Interior Design
  • Interviews
  • Journey of a Book
  • Legal Issues
  • Marketing
  • Podcasts
  • Project Focus
  • Reports
  • Reviews
  • Samples
  • Self Publishing Basics
  • Self-Publishing
  • Social Media
  • Training
  • Video
  • Webinars
  • Writing
Self Publishing Platform
Self Publishing School

COACHING

Self Publishing

PUBLISHING

The Write Life

WRITING

The Book Designer

PRODUCTION

Make a Living Writing

FREELANCE

Freelance Writers Den

WRITING JOBS

Footer

  • Home
  • About
  • Articles
  • Contact
  • Shop
  • Self-Publishing
  • Author Blogging
  • Sitemap
  • Fonts/Typography
Terms of Service
Privacy Policy
Comment Policy
Guest Author Guidelines
Why?
"Writers change the world one reader at a time. But you can't change the world with a book that's still on your hard drive or in a box under your bed. This blog exists to help you get that book into people's hands."
—Joel Friedlander

Copyright Self Publishing School All Rights Reserved. © 2022