Book Page Layout Preparation: Cleaning Up Your Word Files

by Joel Friedlander on November 10, 2010 · 39 comments

Post image for Book Page Layout Preparation: Cleaning Up Your Word Files

Do you ever wonder how the Plain Jane word processing files you turn over to a book designer or typesetter get transformed into the graceful typography and measured lines of a printed book?

Well, that process starts with the most basic and essential first step: file prep. And no matter how routine this step is, it can have a major influence on how well the finished book is constructed, and how efficient it is to lay out.

Follow the Steps to Clean Those Files

It’s likely the manuscript was examined in the bidding process, but once the book is actually in production and the designer has a truly final version of the file, it’s time to take a close and thorough look at the file.

This examination is going to involve:

  • Analysis of the content hierarchy and the way the author has communicated that to the reader. This includes parts, chapters, subheads and other parts of the book intrinsic to its structure.
  • Inventory of the formats used in the book. We’ll count up each set of formats and how they relate to each other in the typographic plan for the book.
  • Perusal of the actual word processing files that the author has presented. We need to be working from the final files to do this properly, and we’re looking for the way the document was created, are formatting styles used, how “rough” is the manuscript?

But no matter what we find, the first necessary step in making the transformation from word processing files to book typography is cleaning up what’s been left in the files.

Over the years I’ve developed standard routines for handling lots of text quickly and efficiently. Most of these routines rely on Word’s powerful Find and Replace function.

It wouldn’t be too hard to write a whole series of articles about this function and the many ways it can be used to massage, clean, adjust and reconfigure massive amounts of text blindingly fast.

What I want to go over here is the basic manuscript cleanup that every book goes through before it gets transported over to the software used for book layout.

Necessary Steps, Necessary Order

It’s important to do these steps in a specific order, and I think you’ll see why as we go through them.

The first task is to take out all the extra spaces in the file.

Note: These instructions apply to all typographic material. If there are tables or similar elements in the book you’ll have to examine how they were created before you use these find/replace techniques. If possible, move the tabular material to a different file, then clean the remaining text before reassembling the file.

There’s a persistent habit that was born in the time of typewriters of putting two spaces between sentences. If you leave these in you won’t be happy with the result, and having two spaces between sentences is not going to make people think your book was done by professionals. Since the end of a sentence is already signified by a period and a space, no other spaces are necessary.

In the Find and Replace dialog, the space character looks the same here as in your file: you can’t see it unless you put your cursor into the “find” or “replace” field. So make sure you put 2 spaces in the “find” field and 1 space in the “replace” field. We’re telling Word to look for any occurrence of 2 spaces next to each other. This will also find instances of 3 spaces, 4 spaces, and so on, since it will continue to find any 2 spaces together.

The “replace” field tells Word to delete the 2 spaces and instead replace them with 1 space.

Starting at the top of the file, I run this first. Because of the way Word performs this task, you will likely have to hit “replace all” more than once to root all these extra spaces out. There will be hundreds of them.

Note: You can access all of the codes that Word uses through the drop down menus at the bottom of the “find/replace” dialog. But if you know the few you’ll use regularly, it’s faster to just type them in yourself.

Word Find and Replace formats

Access to almost every formatting command

Next up is getting rid of other stray spaces. These are usually invisible in the Word file and seem inoccuous. But in page layout, all of these extra characters are a source of potential problems.

preparing book files for publishing

Eliminating trailing spaces: you can see the space when it's highlighted

Our next “find/replace” will be for trailing spaces. These are at the ends of paragraphs, just before the end-of-paragraph marker.

In the “find” field we’ll hit the spacebar and then enter ^p. (You do this by using the ^ character on the keyboard, shift-6.)

This code stands for a paragraph return, the end-of-paragraph marker. Every time this character occurs it signals another discrete paragraph. This will be important later in the process.

In the “replace” field we’ll put just the paragraph symbol: ^p without the space.

What we’re telling Word here is this: “Go look throughout this file and find anyplace there’s a space followed by a paragraph return. Delete both of them and replace them with only a paragraph return.” Because of the nature of this search you will only have to perform it once, and Word will report back how many instances have been changed.

Word Find and Replace for self publishers

Eliminating leading spaces.

Next we’ll look for a related problem, spaces that have crept in at the very beginning of a paragraph. Although it sounds like this would be rare, it’s actually quite common.

Here we’ll find ^p that is, paragraph return then a space, and replace with ^p thus leaving the paragraph return by eliminating the space.

The Problem With Tabs

We’re now ready to deal with the problem of tabs. Tabs stands for “tabular matter” and comes from typewriter days, when little metal sliders would allow you to return to the same spot on the page on subsequent lines.

If you open most book files and look at the codes you’ll find tab codes in odd places. Sometimes paragraphs are indented on the first line by the author hitting the tab key. Sometimes they appear at the end of a paragraph, as if they’ve gotten lost. And sometimes there are lots of tabs on what appear to be blank lines, like they’ve gone on a trip somewhere.

But in books we use tabs sparingly, and only for specific reasons. We never use them to indent paragraphs, or to position graphics or photos, or for many of the reasons people put them into their word processing files. I use tabs for:

  • Tabular material including the Contents page and other simple charts or lists. More complex or extensive lists are better handled with a “table” function found in most layout programs.
  • Some kinds of lists. For instance when we format Notes sections and want the note numbers to align on a decimal point (flush right) and the note following to align normally (flush left) on the same line, we can accomplish this with tabs.

All the rest of the tabs have to come out. And if you don’t have one of these cases where you do need the tabs, it’s faster and more efficient to simply take them all out of the file.

Word Find and Replace Tabs

Eliminating tabs

In the “find/replace” dialog we’ll use the code for a tab character ^t in the find box, and put nothing in the replace field. This turns the “find/replace” function into a “find/delete” function. It’s important to put your cursor into the “replace” field to make sure you haven’t left a space character in there from your previous operations. One click should remove all the tabs from your document.

No More Extra Returns!

Now we’re ready for a final “find/replace” and this one is to get rid of the extra paragraph returns that are in the file.

Word Find and Replace double paragraphs

Eliminating double paragraph returns

By now you can probably guess what we’re going to do. Here we see the very simple find ^p^p that is, two paragraph returns in a row, creating an unwanted and potentially troublesome extra line, and replacing them with a single paragraph return. You might have to run this one a few times to get rid of all the extra returns.

More Fun with Find and Replace

All of the operations we’ve reviewed here are pretty simple once you know how to use the special characters, and you know what result you want to end up with.

Word Find and Replace special

But Word’s Find and Replace has a lot more power and utility than that. Here’s the list of “special” characters you can manipulate. Note the “Find What Text” and “Clipboard” options. these provide a lot of creative flexibility in manipulating your text files.

As a quick example, I had a document that had been formatted in Word with lots of italics for book titles. I wanted to move the document to HTML to use on a web page, but I wanted to switch the Word italics code for HTML. Find and Replace can do this easily:

Word Find and Replace find what

It doesn't look like much, but it works wonders

Notice the formatting instruction in Find. It will find anything in Italic. The Replace uses the HTML codes for begin and end italics. In between is the “Find What Text” code ^& along with the instruction to eliminate the italic formatting. In one click every instance of italic will be de-italicized and wrapped in HTML codes.

Takeaway: Learning to use the Find and Replace functions in your word processor will make doing the necessary file preparation before book layout efficient and consistent.


Image licensed under a Creative Commons Attribution 3.0 License, original work copyright by j nelson, http://www.flickr.com/photos/32807937@N00/

Be Sociable, Share!

    { 34 comments… read them below or add one }

    leo craton June 28, 2014 at 8:47 am

    I’m looking for any information about how to line up the bottoms of facing pages in a manuscript using Word (I have both Word 2010 and 2011.) Can you point me to any article, blog, or training video that addresses this?

    Thanks

    Leo, a frustrated self-pbulisher

    Reply

    Joseph Irvine April 1, 2013 at 1:41 pm

    Joel,

    Thanks for these tips, I use the Find & Replace function quite a bit, but I was unaware of these additional functions. They will help me out a great deal.

    Reply

    Maura van der Linden October 6, 2011 at 10:04 am

    Hendrik,

    I also strongly recommend using styles. I tend to create one named “sceneopen” and set the paragraph style to have a before paragraph spacing of whatever I need to match the house style.

    Reply

    hendrik October 6, 2011 at 9:38 am

    Very informative! I use 4 extra returns to create a space between scenes within chapters. From the discussion above, I presume that invisible extra returns exist in files that appear to be clean. Is there a way to get rid of those invisible extra returns without eliminating the desired extra spacing between scenes?
    Thanks…

    Reply

    Joel Friedlander October 6, 2011 at 9:57 am

    Hendrik,

    You can do this easily by using Word’s Styles feature. Create a style for the opening of a scene and, in the Paragraph options, set a “space below” equal to the space you would like to have separating scenes. The space will automatically be added without any extra characters or paragraph returns. This is the same thing your book designer will do in their page layout software when the file gets to them.

    Reply

    hendrik October 6, 2011 at 10:30 am

    Thanks Joel, I’m just entering this intimidating new file-prepping, self-pubbing world and appreciate the information your site provides. I can’t afford to hire it done so am looking at a steep and probably long learning curve to publication.

    Thanks!

    Reply

    Carradee October 5, 2011 at 11:14 am

    Here’s what confuses me about “two spaces between each sentence is redundant” idea: with fiction, sentences don’t always end with official ending punctuation. So it actually isn’t redundant.

    Particularly if you’re writing in a close POV, a sentence might end with an ellipsis or em dash, for example. If you don’t have some way to differentiate between “This is a continued sentence [fragment]” and “This is a new sentence [fragment]”, that can easily get confusing.

    (Avoiding confusion is also why, in my own writing, I only use double quotation marks for things that are spoken aloud.)

    Ooo, I didn’t know that ^& tip, though! Thanks! I think that should make formatting for Kindle easier. I should be able to write a macro for that, now.

    Also, if you format directly for PubIt, I’ve found it’s easiest to format as if for Smashwords, except to insert “Section Break (Next Page)” for each page break, to produce the Nook version. Use “Find & Replace” to replace all section breaks with page breaks, and you have your Smashwords version—after you change the edition note, of course.

    Reply

    Joel Friedlander October 5, 2011 at 11:17 am

    Thanks for the info, Carradee, that’s very helpful.

    No matter what kind of punctuation your sentence ends with, there should still be only one space between the end of one sentence and the beginning of the next.

    Reply

    Carradee October 5, 2011 at 4:01 pm

    I understand that two spaces is the “rule”, but my point is that the attempt to justify it doesn’t make sense. :)

    Reply

    Maura van der Linden October 5, 2011 at 8:42 am

    I actually just finished writing a class on “Word for Writers” and cover quite a bit of this in the class. There are a lot of Word nuances that can really help make the process of formatting far easier when that time comes.

    Thanks for the great article!

    Reply

    Joel Friedlander October 5, 2011 at 11:13 am

    Thanks, Maura, glad you found it helpful. I’m always looking for articles that will help writers get their manuscripts ready for production, if you’re interested, email me at marin.bookworks (at) gmail.com

    Reply

    Maura van der Linden October 6, 2011 at 10:08 am

    I dropped you a note yesterday, Joel. Hopefully your spam filter didn’t eat it :)

    Reply

    Gordon Burgett October 4, 2011 at 11:05 am

    As usual, great job, Joel.

    I remember when I started giving seminars about publishing I would remind them not to double-space between sentences. You’d think I was telling them to eat their children. Most of the academics just shook their head no. But finally some brave soul would ask me to say that again. So I would, like you did here. Still, I don’t think I got many takers early on. Any idiot who had ever typed out a college paper, they were thinking, knew better than that. Some still seem to think it’s heresy!

    Excellent step-by-step process.

    Reply

    paula hendricks March 31, 2011 at 9:03 am

    thanks joel. as usual, helpful and to the point. i have used a lot of these steps, but i am still uncomfortable that i am not getting a “plain text” file before i begin… this seems even more critical for ebook conversion than for indesign where i have some tools to clean stuff up… have you written about converting the whole thing to truly plain text? on the mac there is no notepad… so i’ve been investigating text wrangler etc… and i know word itself can add a lot of hidden characters…. i’m trying to move to open office. thanks mucho.

    ph

    Reply

    Joel Friedlander March 31, 2011 at 10:09 am

    Hi Paula. I often convert files to plain text to get rid of all the format codes. Of course, you have to account for local formatting like italics that you want to keep in the file.

    In Word, I simply save the file as “Text Only” then close the file. When you re-open it, you’ll have a plain text file that’s squeaky clean, no other programs necessary.

    I do use Text Wrangler, but it’s more for code, and prefer using Word then dropping the file to Text Only.

    Hope that helps.

    Reply

    Julie Weight March 27, 2011 at 3:11 pm

    Macros. They work wonders.

    Reply

    Cindi March 19, 2011 at 4:50 am

    Done. Thank you very much for the quick response. I decided on a 5 1/2 X 8 1/2 book size. I actually like Times New Roman type, and it seems that I should have the manuscript in 12 point text. If I set the margins to have the page text blocks about the same size as the final book will be, the page number ends up far below the text. At the moment, I presume I will pay someone to convert using In Design unless I decide to buy and attempt to learn the program myself. I’m finding your website very helpful.

    Reply

    Joel Friedlander March 19, 2011 at 12:57 pm

    Cindi, thanks for that. Take a look at some of the sample interiors I have posted here to get ideas about how you want your book to look in the end. I don’t recommend people buy InDesign and learn it just to produce one book, it’s simply not efficient. Shop around and find someone you can afford and let them do it, or learn to use Word really well.

    Here’s a link: Design Samples and Case Studies

    Reply

    Cindi March 18, 2011 at 6:32 pm

    I’m a little scared here. No problem eliminating over 2000 extra spaces after the periods. But if I take out the tabs, what happens to the should-be-indented paragraphs. Much of my novel is conversation, so there are a lot of indents. Help.

    Reply

    Joel Friedlander March 18, 2011 at 7:21 pm

    Cindi, no need to be alarmed. It’s much better to use the functions built into the program than throw tab codes into the file. If you’re using Word, in Paragraph Spacing you’ll find a setting for “first line” and you can just put an indent—like 1/2″—in there and Word will take care of it.

    Reply

    Derek Murphy November 17, 2010 at 3:40 pm

    This is amazing. I’ve been going through my book manually several times with each page size change; I’ll try these out next time.

    Reply

    Ed Eubanks November 11, 2010 at 7:32 am

    Great tips and ideas as usual, Joel.

    I have been using InDesign’s find/replace tools instead, recently. They are even more flexible and powerful, and the feature in ID CS5 to find “in all open documents” means I can complete find/replace routines for an entire book in one pass.

    I realize many don’t have or use InDesign, but those who do will benefit even more when it comes to such early document prep.

    Reply

    Joel Friedlander November 11, 2010 at 2:40 pm

    Ed, InDesign’s Find/Change is particularly powerful, but because of that users have to be pretty careful they know what any particular operation will do to their file. This article was aimed at the vast majority of authors and DIY self-publishers who overwhelmingly use Word. Sometime we’ll have to do a big rundown of the Find/Change. You could almost write a book about it. Thanks for your input.

    Reply

    Maggie November 10, 2010 at 2:45 pm

    Joel: thank you, yet again, for another clearly written article on what it takes to whip an author’s manuscript into shape.

    Used well, Word’s search and replace (aka: search and destroy) function is awesome; used on the fly, without thought or in a hurry, it can cause a big mess. One has to pay strict attention to all variables before hitting the button.

    The steps that Joel points out are just a few of the many that typesetters go through when preparing an author’s manuscript for production. And I’m sure, like me, Joel gets seriously exercised when misinformed people ask why, in this day and age, we need book designers and typesetters when all you have to do is hit a button and your manuscript will be transformed into gorgeous pages.

    Reply

    Joel Friedlander November 10, 2010 at 8:17 pm

    Looks like “file prep week” here, unintentionally.

    Maggie, my favorite was the client who said to me:

    “Book design? All you do is put the little numbers on the pages, right?”

    As you pointed out, this article is designed as a “safe” way for people to get the idea of how to clean up their own files.

    I’m not under the illusion that all of a sudden I’m going to get clean-looking files to work on, but listening to the stories of what people go through, spending hours and hours poring over their files, just makes me want to show them a faster, more consistent method.

    Reply

    Rima November 10, 2010 at 9:58 am

    Joel – This is unbelievably helpful, especially considering I use Word to write, and that’s it. I don’t know how to do anything else on it. I had no idea I could all this (for my ebook formatting I manually deleted all tabs. 2 hrs of my life gone forever).

    Please post more on this topic!

    Reply

    Joel Friedlander November 10, 2010 at 8:18 pm

    Stay tuned, Rima, more is coming!

    Reply

    Deb Dorchak November 10, 2010 at 9:08 am

    Thanks for this, Joel, perfect timing as we’re getting ready to go to layout on our own novel. This was extremely helpful.

    Will you be doing a post about cleaning up styles in Word? So many people just use the toolbar to put in things like bold, italics or other styles and that really wreaks havoc when imported into InDesign because there’s a ton of unnecessary styles digital editions of books won’t recognize.

    Reply

    Joel Friedlander November 10, 2010 at 9:46 am

    Ah, yes, all that local formatting that causes people so many problems. There are some good practices for moving text from Word to InDesign I can post about—watch for it.

    Reply

    Deb Dorchak November 10, 2010 at 9:50 am

    Joel, you’re the best! Thanks! I’m looking forward to reading that.

    Reply

    Walt Shiel November 10, 2010 at 4:45 am

    MS Word has a lot more find/replace capabilities than mentioned above, as not all the possible codes are in those drop-down options. Just do a search in Word’s Help for “wildcards” and click on the find/replace result that comes up.

    Also, you can automate this process. Once you’ve figured out exactly what steps you want to use to clean up the file, create a macro to do it. Word makes the macro creation very easy — start recording, run the steps in the order you want them, and then stop the recording. You can name the macro anything you want and add put a button to the top bar for easy access. I actually have five macro buttons up there for tasks that I do a lot, including one that handles the clean-up steps mentioned in this article (plus a few others using the wildcard capabilities).

    Reply

    Joel Friedlander November 10, 2010 at 8:07 am

    Thanks for the power-user tips, Walt. Macros can save a huge amount of time by automating many of these processes.

    New users ought to proceed with a bit of caution. Until you understand exactly what’s in your file, and what the effect of the macro will be throughout the document, it’s probably good to go slowly with automation.

    Reply

    R Thomas Berner November 10, 2010 at 4:21 am

    This is very good advice. I keep telling people who send me items for our development’s newsletter not to format. I’ll do that. Still, I get wrongly formatted stories that require some of the actions you mention. And if the contributor is of a certain age, there are always two spaces after a sentence-closing period. :-)

    Reply

    Joel Friedlander November 10, 2010 at 8:05 am

    Thomas, I feel your pain. There seems to be nothing that can stop people from hitting that space bar twice after a period. The lessons of the typing teachers are pretty persistent. Luckily, that’s one of the easier cleanups.

    Reply

    Leave a Comment


    2 + = four

    { 5 trackbacks }