Speaking in Code: Ebook HTML basics

by | Sep 28, 2016

If, as I keep saying, an ebook is just a website in a box, then in order to know how to get in and edit your ebook, you’re going to want to know some HTML. However you choose to work on the file, knowing the basic building blocks is essential in creating a finished product that presents your book to its best advantage.

When we talk about HTML, we’re actually talking about two separate things:

  1. HyperText Markup Language (HTML): The code that makes up every web page you’ve ever seen. This is how we’ll add content to the ebook. That’s what the rest of this post is going to be about.
  2. Cascading Style Sheets (CSS): This is a set of rules for defining how everything looks. It’s how we’ll format the ebook — and that’s what I’m going to be covering in the next post.

HTML

Calling HTML the basic building blocks of an ebook is apt in more ways than one. Not only is the markup language the fundamental tool for writing and displaying web and ebook content, but it’s also set up as a series of containers — not unlike the nested building blocks that we used to build with as children. Sometimes they’re piled one on top of the other; sometimes one (or a group) is inside another.

Every one of those “blocks” is set off before and after by tags, each of which is marked by angle brackets (< and >). The beginning of a block is marked by an open tag that looks like this: <tag>. That tag ends with a close tag that looks like this: </tag>. All of the content between those tags is said to be part of that container.

So you’d start a paragraph with the <p> tag and end it with the </p> tag:

example-1

The tags will set that chunk of text off as a paragraph. Here’s how that would look:

image-1

Simple, right?

Now, to make things just a bit more complicated, tags can also take what are called attributes, which tell the ereader how to treat certain tags.

An attribute is always added by including the attribute name, an equal sign, straight quotes, and a value. Here’s the same paragraph tag with an attribute added:

example-2

I wonder what that align attribute would do?

image-2

What a surprise! I could also have had the attribute read left, justify, or center.[1]

One warning: make sure that the quote marks around the attribute value are straight (i.e., "right") and not “smart” or “curly” quotes (i.e., “right”). Otherwise the ereader won’t know what it’s looking at and your ebook will break. [2]

Now there are (very basically) two kinds of tags: block tags and inline tags. [3] Containers set off by block tags need not be placed inside other containers. [4] Containers set off by inline tags are always placed inside other containers.

Block Tags

I told you “block” was an appropriate metaphor in more ways than one!

Block tags create self-contained blocks of text — like paragraphs (<p></p>) or articles (<article></article>). Many of them can used at the “root level” of the page — as I said, they need not be inside any other tag (other than the <html> and <body> tags that I’ll talk about later). Many of them can be nested — that is, placed inside another block tag.

Note that when you nest tags, you must always close the most recently opened tag first. Here’s an example:

example-3

Notice that I couldn’t put the end-of-article tag (</article>) until I’d put the end-of-paragraph tag (</p>).

(The <article> tag is part of HTML5, the newest version of HTML in use in web and ebook design.)

Now, in HTML, hitting the RETURN key doesn’t create a new paragraph as you might expect. Some HTML and ePub editing software will take care of that for you by automatically adding paragraph tags (<p></p>) around your text. If you are used to using a blogging system like WordPress, you probably weren’t even aware that those tags were getting added.

When you’re working on the raw HTML in an ebook, however, you don’t get those training wheels.

And unless there’s a tag, the text will simply continue to flow. For example:

example-4

would display as:

image-3

In other words, it would look exactly the same as if the line breaks hadn’t been there.

The advantage to this is that you can place the tags on separate lines of code from the text, which makes it easier to see what tags are still open. Web programming conventions encourage us to indent each nested tag, to make it even easier to see what’s going on:

example-5

Here’s how that would look:

image-4

The indentation in the code makes it easier to see which blocks are inside which, and can be invaluable when it comes time to debug (see below). As the example showed, it won’t be displayed as part of the ebook. Nor will the white space between the paragraphs. The only things that will create a space between paragraphs in a web page (or an ebook) is a block tag.

More Block Tags

We’ve seen the most commonly used block tag, the wonderful paragraph tag (<p></p>).

Here are some more commonly used block tags:

<blockquote></blockquote>

This creates a block quote or extract. Usually this text is displayed in a smaller font size and indented further — but this can be controlled through CSS. You can have multiple paragraphs inside a block quote:

example-6

Here’s how that would look:

image-5

<table></table>

This creates a table structure. To set off the rows and cells in a table you need further tags:

  • <tr></tr>
    This creates a table row. This must appear inside of a <table> block.

  • <td></td>
    This creates a table cell. These are always inside a <tr> block.

  • <th></th>
    This creates a header cell, which behaves just like a regular (<td>) cell, but can be formatted differently

Here’s an example of a basic table:

example-7

And here’s how that would look:

image-6

Notice that the paragraphs are side by side now, instead of stacked one on top of the other. That’s because they’re part of the same row (<tr></tr>) block.

<h1></h1>

This is the tag to identify the most important header on a page. Traditionally, there should be only one per “page.” In ePub coding, this is often the tag used for a chapter title:

example-8

Here’s how that would look:

image-7

<h2></h2>, <h3></h3>, etc.

These tags set off lower-priority headings — chapter sub-titles, section heads, etc. (In publishing parlance, these are called A heads, B heads, and so on.)

example-9

Here’s how that would look:

image-8

<hr></hr>

This displays a hairline — a line that divides the page horizontally. Note that, since there isn’t ever any text between the open and close tags, you can combine them: <hr/>

<br></br>

Creates a line break — not a full paragraph break, but a forced move to the next line. To be honest, these are a last resort; you’re often better off using another block tag.

As with the <hr/> tag, the open and close tags are usually combined: <br/>

<img src=”[file location]” alt=”A description of the image”></img>

This tag will display an image.

The source (src) attribute tells the ebook reader where to look for the image file. The file location will either be a URL (on the open web — it will look like this: https://website.com/image.jpg) or a URI (a file inside the ebook or web page’s local file structure — it will look something like this: ../Images/image.jpg).[5]

The alt attribute gives the image a text description, which is important, for example, when sight-impaired readers use text-to-speech to have the book read aloud. It should also display if you have attempted to display an image from the internet and the ereader isn’t connected.

As with the <hr/> and <br/> tags, the open and close tags are usually combined: <img src=”[location]” alt=”[description]”/ >

Others

There are a lot of other HTML block tags that you can use — especially in the newer ePub3 ebook standard — such as <frontmatter></frontmatter>, <chapter></chapter>, and many more. However, they are not frequently used at this point, and aren’t crucial to creating a well-designed. If you would like to learn more about them, check out the IDPF accessibility guidelines, which offer a wonderful overview of the ePub3 format.

Now, I left out three mandatory block tags that appear in every ePub file, the <html></html>, <head></head> and <body></body> tags. The <html> tag is always the outermost block. The <head> and <body> tags are nested immediately inside, with the <head> tag coming before the <body> tag. To be complete, a file must look something like this:

image-9b

That’s now an actual, complete HTML page! If you copy that into a text editor and save it with the file type html, you can open it with your web browser. Go you!

By the way, did you see the <!-- and --> tags in the header? Those mark the beginning and end of a comment — a coder’s note that isn’t intended to be displayed.

We’d have to add a couple of things to make that code ebook-ready — I’ll be going into those in detail next time — but you could take that same file and import it into an ebook, and it would display perfectly.[6]

Inline Tags

These HTML tags are largely used for formatting. Many of them are used with CSS to help with formatting, and so I won’t touch on those this time.

These tags always appear nested inside of a block tag — usually a paragraph (<p></p>). A few that are important and incredibly useful are these:

<strong></strong>

You put this tag around text to make it bold. The older version of the tag (<b></b>) still works but is no longer recommended. Why? Well, what happens when you place one <strong> section inside another? Usually it’s just bolded — but some fonts or styles have additional changes that can be made.

<em></em>

You put this tag around text to make it italic. Again, the older version of the tag (<i></i>) still works, but is deprecated. Nested <em> tags can do some nifty things. When you put one <em> tag inside another — as in a foreign word or a book title inside of an italicized section — the inner text will display not as italic, but as roman (regular) type. (This won’t work on every browser or ereader, but it’s nice when it does.)

<a href=”[file location]” ></a>

This is the tag that made the web what it is and that makes an ebook unlike its print cousin: the hyperlink. Anything within the tag — whether it’s actual text or an image tag — becomes a link.[7]

You can either link to an outside web page (a URL) or link to a file (or location within a file) within the file structure of your ebook (a URI).

Here’s how you’d create a hyperlink:

example-10

And it would look something like this:

image-9

<sup></sup>

This makes the text superscript — that is, it shifts the text up (and usually makes it smaller).

<sub></sub>

This makes the text subscript — that is, it shifts the text down (and, again, usually makes it smaller).

<span></span>

This defines a section within a block of text. It can be used for identification purposes, or, most commonly, for formatting the text. We’ll be going into this one in depth next time.

Identity code: the ID attribute

By the way, by adding an id attribute, you can turn any of these tags into anchors — locations within the file. You point an <a> tag at the anchor by adding a pound sign (#) and the id to the file location in the href attribute.

So let’s say we gave an image file an id of image (imaginative, right?):

example-11

If I wanted to link to the location of the image in a file, I’d create a hyperlink:

example-12

By the way, each id must be unique in the file — you can’t have more than one id=”image”.

You can use this to create back-and-forth reciprocal links — something I do all the time when I’m creating footnotes. Each <a> gets its own id; you link back and forth. Let’s say you put this into the file chapter1.html:

example-13

and this into the file notes.html:

example-14

You would be able to click on the footnote link after the quote to go to the note, and then click on link at the beginning of the note to go back to the quote. Whew!

Obviously, there’s a lot more to learn about HTML. A great free resource is the online reference W3Schools.com. They’ve got a complete and up-to-date listing of every tag, attribute, and weird wrinkle — not just for HTML, but for CSS and Javascript. There are tutorials, examples, and all sorts of great information.[8]

Next time, I’ll be talking about how to make our ebooks beautiful. Time to bring on the style!


[1] Okay. So the align attribute is deprecated — that means it’s essentially obsolete and shouldn’t be used. There are better ways to handle alignment, as we’ll learn in the post on CSS!
[2] This is why I don’t edit code in word processing software like Microsoft Word or OpenOffice.
[3] Okay. So block and inline are also possible values for the display attribute. As is block-inline (!),none, and a few more. But honestly? You don’t care about that yet. You may never.
[4] Though many of them can be.
[5] See the discussion on what’s inside of an ebook for my discussion on URIs and URLs.
[6] It would display, but it wouldn’t validate — you couldn’t open the file on an ereader.
[7] You know all of those buttons you’ve been clicking on on web pages and in apps? Those are just images with hyperlink tags around them. (Well, okay, sometimes they’re more complicated than that — but that’s the concept.)
[8] Just remember, when you’re researching for ebooks, you want to use the latest HTML5 syntax. And the W3Schools references will always let you know if a tag is no longer recommended.

 
Photo: Pixabay

tbd advanced publishing starter kit

17 Comments

  1. Louise

    Hiya. Thanks for this information. Silly question (I’m a beginner) what’s the best software to use for this? Do I add the code in Jutoh or Word or Scrivener?

    Reply
    • Noel Morado

      A simple Notepad is enough to insert tags or codes in your content. The idea is to style your text with codes and identify each text block with html tags. Creating ebooks is different from the usual word processing we have.

      Reply
    • rutvik ahir

      noteped++ is best…

      Reply
  2. J. T.

    In your clear and thankfully understandable discussion of reciprocal anchor/links, you seem to be missing an attribute …id=”note-reference-1″… in “chapter1.html” to get back to the text link from the footnote, no?

    Reply
  3. Cat Michaels

    David, I’m a non-techie author who’s struggling to write simple code for my blog’s outro on Weebly instead of cutting and pasting the same information, links every week…a time-consuming 15 minutes of hair-pulling. I’ll give your post a close read and try your suggestions for my next outro. If I can get html code to work, I’ll owe you a huge attaboy and many thanks -:D.

    Reply
    • David Kudler

      Cat, good luck! That’s how I learned HTML, back in the day — trying to make a Livejournal blog look passable.

      Reply
  4. Patrick Samphire

    A good introduction to HTML. A few minor corrections: There’s no such thing as and

    . These are ’empty’ elements and are self-closing. You can write OR (in XHTML), for example, but NOT .

    Also,

    stands for ‘Horizontal Rule’, not ‘hairline’.

    Reply
    • Patrick Samphire

      Well, that didn’t work as it stripped all the HTML elements out of the comment. I shall try again replacing the angle brackets with square brackets:

      A good introduction to HTML. A few minor corrections: There’s no such thing as [br][/br] and [hr][/hr]. These are ’empty’ elements and are self-closing. You can write [br] OR [br /] (in XHTML), for example, but NOT [br][/br].

      Also, [hr] stands for ‘Horizontal Rule’, not ‘hairline’.

      Reply
      • David Kudler

        Thanks, Patrick!

        Yes, I understand that no one ever uses <br></br> or the others. But they are in fact perfectly correct HTML and will validate at both validator.idpf.org (for ePub files) and validator.w3.org (for all HTML, but intended for web pages). They’re just a waste of time and space. However, I wanted to make it as obvious as possible for a first-timer that there’s nothing fundamentally different between the tag for a line break and the tag for a paragraph.

        And thanks for the correction — you’re of course right that <hr> stands for “horizontal rule.” That’s my print background talking. :-p

        Reply
        • Patrick Samphire

          I know I’m being picky (I know this is a flaw, believe me!), but including the closing br and hr tag is only valid in XHTML. It’s forbidden in HTML3.2, HTML4 and HTML5. The specs are pretty clear on it. But of course it will work and does tend to pass validators, so like I said, this is just me being picky.

          Reply
      • Ben

        Actually it’s not so much that it stripped those tags as much as WordPress allows a subset of HTML tags in comments (which is how I added the emphasis in my earlier comment), so you’d need to use the old ampersand escape method to display the less than and greater than brackets or use a pre-formatted code block if that much is available (I can’t recall).

        Reply
        • David Kudler

          And yes: discussing HTML on a WordPress site is a joy, isn’t it? ;-)

          Reply
          • Ben

            Perhaps Joel can be convinced to install the markdown editing plug-in … though I can’t recall if it works with comments or just posts. I’m assuming, of course, that he won’t open up that REST API plug-in (and future core feature) for random comments from the Big Bad Internet.

          • David Kudler

            Would it totally destroy my geek cred, Ben, to say that I don’t like markdown? Makes me feel like I’m editing Wikipedia pages. ;-)

  5. Ben

    Tables are a terrible way to display text, images and objects side by side. You should be using floats instead (presumably that’ll put in an appearance in the CSS instalment).

    As far as no warnings on the writing raw [X]HTML go, speak for yourself … I’ve got all that and the XML covered very nicely with oXygenXML Editor (I shouldn’t need to tempt moderation by linking to it, everyone should be able to guess the URL from the name of the editor). It goes above and beyond, though; like creating fully searchable webhelp sites filled with my background material for any given fictional setting just by using the same XML base as the fictional work itself.

    Why?

    Well, because I can. Unlike Scrivener, Calibre, Sigil and so many others it will produce valid EPUB 3.0 files without any vendor specific extra crap (hello iBooks Author), or misappropriated MARC Relators (hi Scrivener), or invalid ID attributes (Aloha Smashwords and probably Calibre too). As for my fellow geeks; Sphinx doesn’t cut it, too many validation errors (usually more errors than lines inside); Pandoc cheats gratuitously (basically turns it into a single HTML page with page breaks), Org-Mode relies on Pandoc, Writer2ePub still only does ePub 2.0 well, LaTeX is almost entirely print focussed and so on. For the professional publishing world, thou shalt not invoke InDesign (until it actually behaves properly with other XML schemas and doesn’t eat or add whitespace arbitrarily) and everyone else is playing catch-up.

    Well, in fairness it can produce two errors, but only one is detected by the epubcheck validator (and both are fixed in the OPF file; a duplication in the manifest and the modification timestamp may be off 50% of the time). Both of those get fixed in under 5 minutes, usually while tweaking the metadata anyway. The timestamp is the bit which is not detected, the manifest duplication is a result of the extra things Apple insists on to have covers display in iBooks or on an iPad (iOS device).

    Still, while I now consider oXygenXML essential, it definitely won’t be for everyone. Some XML or HTML knowledge is required and a willingness to learn, but in the XML world it is essentially the best option available (and runs on all three major platforms: Windows, OS X and GNU/Linux). Between that and my text editor of choice (GNU/Emacs) I’m pretty much set.

    Reply
    • David Kudler

      Thanks, Ben! As I mentioned in a comment to the post on ePub conversion (and editing) apps, oXygenXML is in fact the Lamborghini of XML-editing packages — able to handle all flavors of HTML, including that boxed up as an ebook. It is without a doubt the industry standard.

      I tried it out a few years ago and found it indeed to be more than fully featured. However the learning curve was very steep, few of the features that were available that weren’t included in the editors that I used (Calibre, Sigil, and — for unpacked files — Dreamweaver) seemed applicable to ebook design, there were some missing tools that I’d come to rely on, and the price was too much for me to justify at $549 (without support). However, that was a while ago; my understanding of HTML and XML is deeper now than it was, and I’m sure that oXygenXML has grown. I’ll try to check it out again.

      I use Calibre and Sigil to produce valid ePub3 files every day. (Sigil was woefully behind in this regard, but has caught up quite nicely. Up until the last six months, I stuck to ePub2 for most ebooks anyway, but since NookPress now allows the newer standard, I feel as if I can create one file that my clients and I can distribute to everyone.)

      And you’re absolutely right that tables can be an enormous pain. Large multi-row text tables become messes in reflowable ebooks; once the ereader tries to split the table across pages, either the ebook breaks or the table stops displaying correctly. In such cases, I’ve often found that using an image of the table that will resize to one screen works better than any other solution I’ve found. Not an optimal answer by any means.

      Using a float CSS value can be a great way to handle that, but you have to be very careful if you want to line things up — you have to count out the number of “columns” you’re trying to create for each line, and you ought to add a number of other values — such as width, margin, and possibly border. But indeed, it will work better than a table — especially if you create a linked CSS stylesheet and use classes to create consistent styling to streamline the coding (which is the topic of my next post).

      Of course, some older ereader apps handle float even less consistently than they handle table. But that’s less and less an issue with each passing year, as most people update — at least to the current decade.

      Reply
      • Ben

        Over $500 USD, sounds like the professional license. I think you might want to have another look at this particular lamborghini (the only such beast I can actually afford). They now have a personal license at $199 USD (or $280 with 2 years support/upgrades) which is identical to the professional (full editor, author, developer) and can be used commercially, but must be paid for by the individual using it, not their employer. So it specifically meets the needs of freelancers and solo operations, while professional licensing nabs the SME market and enterprise is IBM and Oracle land. It’s that personal license that brought me into it and it’s pretty much the only proprietary software package I have no qualms supporting entirely (and since I’m on the dev team of a certain GNU Project security product, that’s not said lightly).

        It was Sigil’s collapse in development that spurred me to go and find something that could handle EPUB 3 in the first place. Which is how I was directed to oXygen, DITA and D4P which are now the foundation of my deployed publishing platform. Mind you, I’ve got a significant amount of professional geekery experience and it still took me 6 to 9 months to get a firm enough grip on the thing to be sure of it and the scalability (which it turns out is extraordinarily scalable in both scope and deployment contexts).

        Mind you, D4P does make it much easier to support multilingual products if necessary and my to do list already includes at least SSML/TTS to try to meet the needs of the DAISY Consortium. That’s the sort of thing that is motivating me to leave EPUB 2.0 in the past. It’s not a total dealbreaker (I’m too small a publisher at this point for it to be a legal requirement), but I’d like to be able to do it well before I’m in a position where I have to. I’d like to add braille output for much the same reason, but while grade 1 isn’t much more than transliteration, grade 2 is quite another beast.

        Besides, I’ve already done the easy part: confirmed that consistently good output for poetry and song lyrics is not a huge chore (I’m not a poet, but all the poets I know seem to struggle with this bit and stick with print so I made sure I had it covered on the way through). That’s without having to resort to fixed layout, monospaced fonts, cheating between paragraph breaks or embedded SVGs too.

        Although I was also a little lucky with the timing of this discovery because the lead dev for the D4P specialisation was paid by a certain “Big 5” company (which may or may not be owned by a company with strong ties to cable television and newspapers which is in turn owned by a certain expat Aussie, turned Yank) to update the EPUB plugin from 2.0 to 3.0 last year and that work was finishing just as I was discovering it (so I kinda beta tested and bug hunted it a bit, but the benefits are … considerable. Their deployment is customised for their print output (it feeds back into InDesign workflows), while I customised my own deployment (mainly to avoid InDesign like the plague, while still making it look decent). Plus I published an EPUB 3 with it before the people who paid for it (my “live test” was released in January, their first use of it was in May and they certainly got the print builds working before me).

        So not really a starter package, but it’s easily proof that high quality is not actually limited to the well funded major and traditional publisers. Both the DITA-OT and D4P are open source (on github, of course); it’s just that oXygenXML Editor makes them so much less painful to deal with (both frameworks are integrated into the editor with the XSLTs and other transformation frameworks). So currently the top of the “to do” list is finalising print processes (I don’t see how the PDF/X-3 format manages to cause so much consternation either, I guess the geekery helped there too, but no matter).

        Oh, BTW, check your risuko email (mine is predictably easy to guess from my linked website) … there’s a little beta/trial license for you to play with (18.1 is in beta now). So I guess your timing is pretty good too. ;)

        Reply

Trackbacks/Pingbacks

  1. General Advice for Self-Publishers | Drinking Cafe Latte at 1pm - […] an .epub3 file using HTML if you really want tight control of your book’s presentation. Here’s an article that…
  2. Speaking in Code: Ebook HTML basics | Stillpoint Digital Press - […] This post originally appeared on Joel Friedlander’s wonderful site, TheBookDesigner.com. […]
  3. Top Picks Thursday! For Readers & Writers 10-6-2016 | The Author Chronicles - […] Writers are business people as well as artists these days, and there’s a lot of important minutiae to understand.…
  4. Speaking in Code: Ebook HTML basics  | Ebo... - […] An ebook is just a website in a box. In order to know how to get in and edit…
  5. Speaking in Code: Ebook HTML basics - The Book ... - […] An ebook is just a website in a box. In order to know how to get in and edit…

Submit a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.