• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • Self-Publishing
  • Author Blogging
  • Sitemap
  • Fonts/Typography

The Book Designer

Practical advice to help build better books

by selfpublishing.com

COACHING


PUBLISHING


WRITING


PRODUCTION


FREELANCE


WRITING JOBS

  • Home
  • About
  • Articles
  • Contact
  • Shop
You are here: Home / Book Production / A Long Slog through the OCR Swamp

A Long Slog through the OCR Swamp

by Joel Friedlander on September 5, 2009 4 Comments


This post is the third in a series about the creation of a new book. To see all the articles in the series, click “The Journey of a Book” tab at the top of the page.

While contemplating the hours it was going to take to make all the corrections to the files produced by VelOCRaptor OCR—the optical character recognition software I had used to turn the original PDF files into Word files—I started to mentally calculate how long it would be before this sudden “new” book I had discovered was going to take to get into print.

This type of correction is frustratingly slow. If, like me, you are constantly interrupted during your work day by ringing phones, pinging emails, kids who think they ought to have lunch, and dogs who apparently believe that their home comes with a private doorman, it can become so frustrating that it slides toward the bottom of the “to-do” pile and just stays there.

Luckily, I had a friend—Arisha Wenneson of Wenneson Services—who is meticulous and had time to do the job. We soon arranged a fee that was agreeable, and I happily zipped the files and sent them off. Placed side by side on the screen, the PDF showing what should be in the file, and the OCR text in a Microsoft Word document would at least be convenient for correction. As the email progress bar clicked down I breathed a sigh of relief. Now I knew there was one big obstacle that wouldn’t be holding me back.

Sure enough, in a few days the new Word files, clean and shiny, started to show up in my inbox. Here’s the result:

Before correction and after

Arisha had done an outstanding job. From the mess of OCR mayhem she had produced beautiful, accurate, junk-free Word files. I began to think this book would become a reality after all.

Of course, while looking over the files (which were now much easier to read) I realized that it would not be possible to publish the book without editing. The words, phrases, interjections, hemming and hawing that fill up our spoken communication had all been preserved by the transcript. But who wants to read a lot of filler? The things you say when you’re standing in front of a room of people trying to remember the point you were making?

There was no avoiding it. I would have to sit and edit every paragraph to get rid of the remaining “junk” that made the lectures tough reading. If I was going to be kind to my prospective readers–a goal all authors should aspire to–I would have to get out the “blue pencil” and get to work.

Next up: Editing, editing, editing

Filed Under: Book Production, Journey of a Book, Self-Publishing

journal marketing

Reader Interactions

Comments

  1. jedidiah manowitz says

    October 31, 2018 at 7:19 pm

    there is no tab at top of page for journey of a book

    Reply
    • Joel Friedlander says

      November 1, 2018 at 1:57 pm

      jedidiah,

      Well, there was one in 2009 when this article was published. It’s moved to the right sidebar in the list of “Topics.”

      Reply
  2. Joseph Gregory says

    April 26, 2011 at 9:32 pm

    “kids who think they ought to have lunch”
    LMAO

    Reply

Trackbacks

  1. Is It Worth Converting an Old Book Into an eBook? | ARCHITAMENT says:
    June 3, 2013 at 6:08 pm

    […] clean copy will need to go to an OCR (optical character recognition) scanning service. They will scan each page of text and create a […]

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

Get the Advanced Publishing Kit

Topics

  • Audiobooks
  • Author Blogging 101
  • Book Construction Blueprint
  • Book Design
  • Book Printing
  • Book Production
  • Book Reviews
  • Cameras
  • Contributing Writers
  • Cover Design
  • E-Books & Readers
  • Editorial
  • Guest Posts
  • Interior Design
  • Interviews
  • Journey of a Book
  • Legal Issues
  • Marketing
  • Podcasts
  • Project Focus
  • Reports
  • Reviews
  • Samples
  • Self Publishing Basics
  • Self-Publishing
  • Social Media
  • Training
  • Video
  • Webinars
  • Writing
Self Publishing Platform
Self Publishing School

COACHING

Self Publishing

PUBLISHING

The Write Life

WRITING

The Book Designer

PRODUCTION

Make a Living Writing

FREELANCE

Freelance Writers Den

WRITING JOBS

Footer

  • Home
  • About
  • Articles
  • Contact
  • Shop
  • Self-Publishing
  • Author Blogging
  • Sitemap
  • Fonts/Typography
Terms of Service
Privacy Policy
Comment Policy
Guest Author Guidelines
Why?
"Writers change the world one reader at a time. But you can't change the world with a book that's still on your hard drive or in a box under your bed. This blog exists to help you get that book into people's hands."
—Joel Friedlander

Copyright Self Publishing School All Rights Reserved. © 2022