Speaking in Code: Ebook HTML basics

POSTED ON Sep 28, 2016

Home > Blog > E-Books & Readers > Speaking in Code: Ebook HTML basics

If, as I keep saying, an ebook is just a website in a box, then in order to know how to get in and edit your ebook, you’re going to want to know some HTML. However you choose to work on the file, knowing the basic building blocks is essential in creating a finished product that presents your book to its best advantage.

When we talk about HTML, we’re actually talking about two separate things:

HyperText Markup Language (HTML): The code that makes up every web page you’ve ever seen. This is how we’ll add content to the ebook. That’s what the rest of this post is going to be about.
Cascading Style Sheets (CSS): This is a set of rules for defining how everything looks. It’s how we’ll format the ebook — and that’s what I’m going to be covering in the next post.

HTML

Calling HTML the basic building blocks of an ebook is apt in more ways than one. Not only is the markup language the fundamental tool for writing and displaying web and ebook content, but it’s also set up as a series of containers — not unlike the nested building blocks that we used to build with as children. Sometimes they’re piled one on top of the other; sometimes one (or a group) is inside another.

Every one of those “blocks” is set off before and after by tags, each of which is marked by angle brackets (< and >). The beginning of a block is marked by an open tag that looks like this: <tag>. That tag ends with a close tag that looks like this: </tag>. All of the content between those tags is said to be part of that container.

So you’d start a paragraph with the <p> tag and end it with the </p> tag:

The tags will set that chunk of text off as a paragraph. Here’s how that would look:

Simple, right?

Now, to make things just a bit more complicated, tags can also take what are called attributes, which tell the ereader how to treat certain tags.

An attribute is always added by including the attribute name, an equal sign, straight quotes, and a value. Here’s the same paragraph tag with an attribute added:

I wonder what that align attribute would do?

What a surprise! I could also have had the attribute read left, justify, or center.^[1]

One warning: make sure that the quote marks around the attribute value are straight (i.e., "right") and not “smart” or “curly” quotes (i.e., “right”). Otherwise the ereader won’t know what it’s looking at and your ebook will break. ^[2]

Now there are (very basically) two kinds of tags: block tags and inline tags. ^[3] Containers set off by block tags need not be placed inside other containers. ^[4] Containers set off by inline tags are always placed inside other containers.

Block Tags

I told you “block” was an appropriate metaphor in more ways than one!

Block tags create self-contained blocks of text — like paragraphs (<p></p>) or articles (<article></article>). Many of them can used at the “root level” of the page — as I said, they need not be inside any other tag (other than the <html> and <body> tags that I’ll talk about later). Many of them can be nested — that is, placed inside another block tag.

Note that when you nest tags, you must always close the most recently opened tag first. Here’s an example:

Notice that I couldn’t put the end-of-article tag (</article>) until I’d put the end-of-paragraph tag (</p>).

(The <article> tag is part of HTML5, the newest version of HTML in use in web and ebook design.)

Now, in HTML, hitting the RETURN key doesn’t create a new paragraph as you might expect. Some HTML and ePub editing software will take care of that for you by automatically adding paragraph tags (<p></p>) around your text. If you are used to using a blogging system like WordPress, you probably weren’t even aware that those tags were getting added.

When you’re working on the raw HTML in an ebook, however, you don’t get those training wheels.

And unless there’s a tag, the text will simply continue to flow. For example:

would display as:

In other words, it would look exactly the same as if the line breaks hadn’t been there.

The advantage to this is that you can place the tags on separate lines of code from the text, which makes it easier to see what tags are still open. Web programming conventions encourage us to indent each nested tag, to make it even easier to see what’s going on:

Here’s how that would look:

The indentation in the code makes it easier to see which blocks are inside which, and can be invaluable when it comes time to debug (see below). As the example showed, it won’t be displayed as part of the ebook. Nor will the white space between the paragraphs. The only things that will create a space between paragraphs in a web page (or an ebook) is a block tag.

More Block Tags

We’ve seen the most commonly used block tag, the wonderful paragraph tag (<p></p>).

Here are some more commonly used block tags:

<blockquote></blockquote>

This creates a block quote or extract. Usually this text is displayed in a smaller font size and indented further — but this can be controlled through CSS. You can have multiple paragraphs inside a block quote:

Here’s how that would look:

<table></table>

This creates a table structure. To set off the rows and cells in a table you need further tags:

<tr></tr>
This creates a table row. This must appear inside of a <table> block.
<td></td>
This creates a table cell. These are always inside a <tr> block.
<th></th>
This creates a header cell, which behaves just like a regular (<td>) cell, but can be formatted differently

Here’s an example of a basic table:

And here’s how that would look:

Notice that the paragraphs are side by side now, instead of stacked one on top of the other. That’s because they’re part of the same row (<tr></tr>) block.

<h1></h1>

This is the tag to identify the most important header on a page. Traditionally, there should be only one per “page.” In ePub coding, this is often the tag used for a chapter title:

Here’s how that would look:

<h2></h2>, <h3></h3>, etc.

These tags set off lower-priority headings — chapter sub-titles, section heads, etc. (In publishing parlance, these are called A heads, B heads, and so on.)

Here’s how that would look:

<hr></hr>

This displays a hairline — a line that divides the page horizontally. Note that, since there isn’t ever any text between the open and close tags, you can combine them: <hr/>

<br></br>

Creates a line break — not a full paragraph break, but a forced move to the next line. To be honest, these are a last resort; you’re often better off using another block tag.

As with the <hr/> tag, the open and close tags are usually combined: <br/>

<img src=”[file location]” alt=”A description of the image”></img>

This tag will display an image.

The source (src) attribute tells the ebook reader where to look for the image file. The file location will either be a URL (on the open web — it will look like this: https://website.com/image.jpg) or a URI (a file inside the ebook or web page’s local file structure — it will look something like this: ../Images/image.jpg).^[5]

The alt attribute gives the image a text description, which is important, for example, when sight-impaired readers use text-to-speech to have the book read aloud. It should also display if you have attempted to display an image from the internet and the ereader isn’t connected.

As with the <hr/> and <br/> tags, the open and close tags are usually combined: <img src=”[location]” alt=”[description]”/ >

Others

There are a lot of other HTML block tags that you can use — especially in the newer ePub3 ebook standard — such as <frontmatter></frontmatter>, <chapter></chapter>, and many more. However, they are not frequently used at this point, and aren’t crucial to creating a well-designed. If you would like to learn more about them, check out the IDPF accessibility guidelines, which offer a wonderful overview of the ePub3 format.

Now, I left out three mandatory block tags that appear in every ePub file, the <html></html>, <head></head> and <body></body> tags. The <html> tag is always the outermost block. The <head> and <body> tags are nested immediately inside, with the <head> tag coming before the <body> tag. To be complete, a file must look something like this:

That’s now an actual, complete HTML page! If you copy that into a text editor and save it with the file type html, you can open it with your web browser. Go you!

By the way, did you see the  tags in the header? Those mark the beginning and end of a comment — a coder’s note that isn’t intended to be displayed.

We’d have to add a couple of things to make that code ebook-ready — I’ll be going into those in detail next time — but you could take that same file and import it into an ebook, and it would display perfectly.^[6]

Inline Tags

These HTML tags are largely used for formatting. Many of them are used with CSS to help with formatting, and so I won’t touch on those this time.

These tags always appear nested inside of a block tag — usually a paragraph (<p></p>). A few that are important and incredibly useful are these:

<strong></strong>

You put this tag around text to make it bold. The older version of the tag (<b></b>) still works but is no longer recommended. Why? Well, what happens when you place one <strong> section inside another? Usually it’s just bolded — but some fonts or styles have additional changes that can be made.

<em></em>

You put this tag around text to make it italic. Again, the older version of the tag (<i></i>) still works, but is deprecated. Nested <em> tags can do some nifty things. When you put one <em> tag inside another — as in a foreign word or a book title inside of an italicized section — the inner text will display not as italic, but as roman (regular) type. (This won’t work on every browser or ereader, but it’s nice when it does.)

<a href=”[file location]” ></a>

This is the tag that made the web what it is and that makes an ebook unlike its print cousin: the hyperlink. Anything within the tag — whether it’s actual text or an image tag — becomes a link.^[7]

You can either link to an outside web page (a URL) or link to a file (or location within a file) within the file structure of your ebook (a URI).

Here’s how you’d create a hyperlink:

And it would look something like this:

<sup></sup>

This makes the text superscript — that is, it shifts the text up (and usually makes it smaller).

<sub></sub>

This makes the text subscript — that is, it shifts the text down (and, again, usually makes it smaller).

<span></span>

This defines a section within a block of text. It can be used for identification purposes, or, most commonly, for formatting the text. We’ll be going into this one in depth next time.

Identity code: the ID attribute

By the way, by adding an id attribute, you can turn any of these tags into anchors — locations within the file. You point an <a> tag at the anchor by adding a pound sign (#) and the id to the file location in the href attribute.

So let’s say we gave an image file an id of image (imaginative, right?):

If I wanted to link to the location of the image in a file, I’d create a hyperlink:

By the way, each id must be unique in the file — you can’t have more than one id=”image”.

You can use this to create back-and-forth reciprocal links — something I do all the time when I’m creating footnotes. Each <a> gets its own id; you link back and forth. Let’s say you put this into the file chapter1.html:

and this into the file notes.html:

You would be able to click on the footnote link after the quote to go to the note, and then click on link at the beginning of the note to go back to the quote. Whew!

Obviously, there’s a lot more to learn about HTML. A great free resource is the online reference W3Schools.com. They’ve got a complete and up-to-date listing of every tag, attribute, and weird wrinkle — not just for HTML, but for CSS and Javascript. There are tutorials, examples, and all sorts of great information.^[8]

Next time, I’ll be talking about how to make our ebooks beautiful. Time to bring on the style!

^[1] Okay. So the align attribute is deprecated — that means it’s essentially obsolete and shouldn’t be used. There are better ways to handle alignment, as we’ll learn in the post on CSS!
^[2] This is why I don’t edit code in word processing software like Microsoft Word or OpenOffice.
^[3] Okay. So block and inline are also possible values for the display attribute. As is block-inline (!),none, and a few more. But honestly? You don’t care about that yet. You may never.
^[4] Though many of them can be.
^[5] See the discussion on what’s inside of an ebook for my discussion on URIs and URLs.
^[6] It would display, but it wouldn’t validate — you couldn’t open the file on an ereader.
^[7] You know all of those buttons you’ve been clicking on on web pages and in apps? Those are just images with hyperlink tags around them. (Well, okay, sometimes they’re more complicated than that — but that’s the concept.)
^[8] Just remember, when you’re researching for ebooks, you want to use the latest HTML5 syntax. And the W3Schools references will always let you know if a tag is no longer recommended.