First steps with HTML5

As I mentioned in my earlier post, HTML5 means quite a lot more than what we all understood markup to be in the HTML 4/XHTML days. At the core of the HTML5 specification however, markup is still the foundation. Let’s take a quick look at some of the differences between HTML5 and it’s predecessors. But before we get too into this HTML5 series, I should mention that the principal reason I’m posting my thoughts here on the subject is to learn for myself, and secondly to document a nugget of information or two that might be useful to you all out there. Excellent stuff has been written on this subject. Go read Bruce Lawson and Remy Sharp’s book Introducing HTML5, Mark Pilgrim’s HTML5 Up and Running, and Tantek Çelic’s HTML5 Now. Go read every one of these books cover to cover — I highly recommend them.

It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.

—Albert Einstein

Einstein and Tagore, Berin, 1930
Physicists with mad hair who most assuredly would have been totally down with HTML5.

I will now go through some of the reasoning akin to the way Pilgrim and Lawson follow in constructing the optimal basic HTML5 markup template, with my own commentary, flavor, and style thrown in just so I can acquire it in my head and hopefully help you all along with the reasoning:

The first line of code we see in HTML documents is the doctype. The reason why it is standard practice to use doctypes in markup, as all good web developers already know, is to trigger standards mode in web browsers. All major modern web browsers support this function, and it makes the role of a web developer much simpler to develop a consistent user experience across all platforms. If you’re familiar with the doctypes of HTML 4 and XHTML 1, you are familiar with how complex they appear.

Not the type of thing one would readily commit to memory at first glance. The HTML5 doctype is significantly shorter:

There – fixed it.

With just a little experimentation, it was found that all browsers triggered standards mode with just the above minimal amount of code. So the HTML5 spec was written to codify what was already in existence, in the most compact and simplistic way that works. No long URL. No unmemorable string of voodoo or versioning cluster nonsense. Just the standards mode, please. That’s all we need – something simple and to the point.

The next thing you should know about is defining a character set for your document. This is a departure from the straightforwardness of our journey into HTML5 markup, but it is important for addressing a security concern where browsers attempt to guess character encodings that could conceal malicious scripts, so let’s get off on the right foot shall we? First, the old way:

Again, not the most memorable code. But then, that’s why I like HTML5: It fixes things. This is much better:

I should also point out something important here. In HTML5, you don’t need to place quotation marks around attribute values if there are no spaces. Spaces separate multiple values, and you need those quotes to herd them together and distinguish them from standalone attributes (which we’ll get into later). So, this would also be perfectly valid:

If you’re compressing a document for speedy delivery over slow networks, such as mobile contexts, then here’s a place to save a few bytes. But in general, it’s my personal preference to quote my attribute values for legibility’s sake.

Another thing you might notice is that this standalone element is not self-closing in the XML sense. You could write it that way, as in:

That’s with the trailing slash before the end closing angle bracket, in case you missed it. This would be the way it would be done were our document conforming to XHTML rules. XHTML5 is the XML-conformant variant of HTML5 and is developed as an option to the HTML5 specification. But it is not necessary, unless you really need XML parsing to be enabled. And the good news is, HTML5 allows for SVG and MathML embedding without having to switch to XHTML mode, so for most contexts even at the scientific level, we won’t need the X tacked on to the front of our HTML5. But please, don’t let that stop you from self-closing those tags. I myself only recently got out of the habit, after writing HTML5 for the past 10 months or so. It’s perfectly valid either way.

The rest of your basic HTML5 document at this point will look very familiar, with just a few things to point out. The most notable difference will be the opening HTML tag. Usually we’d just open up our markup tree with this:

However, if we were pulling out all the XML stops and such, we would be defining a namespace and a language, as so:

In HTML5, it is certainly not necessary to define a namespace (the xmlns part) because that much is assumed. That leaves us with the language declaration, in the form of the lang attribute. Lang attributes are specified according to IETF BCP47, and there’s a practical list of these codes that may be used on MSDN. A lang attribute is used by search engines to understand the content meaning better and categorize the results. It is used by speech synthesizers to produce the correct pronunciation of words with similar spellings across languages. It is used by browsers for producing the correct hyphenation, spelling correction, and so on, even across regional dialects. A lang attribute specifies the language of the contents of the given element, which means you may specify several languages on a given document.

Do use the lang attribute. Even better – use it regionally. I would specify lang=”en-us” (English – U.S.) for most of my web work, but on occasion I’ll dip into Traditional Chinese for my language studies, with specific vocabulary rules for Taiwan, in which case I’d use lang=”zh-tw” (Chinese, or “zhongwen” – Taiwan).

I’m fascinated by language processing and character sets in computing, so forgive my overly-thorough description of the situation back there. The point is, in HTML5, we can shorten this information on the opening HTML element to this by removing the namespace and the xml:lang attributes, and including the addition of my regional preference:

There, that’s a gooood HTML element. For other elements in your document, such as perhaps LI or P, you might specify additional languages as needed.

Bruce Lawson has a nice, clear writeup of what he considers to be the minimal HTML5 document framework. I agree with this markup template, with my own minor stylistic modifications presented below:

The rest is pretty straightforward, right? We have the overall wrapping HTML element, a HEAD, a BODY, our charset definition, our lang attribute set to Amurikun, well-formed tag organization, a title attribute, and some content. That’s it – not too different from our past experiences with HTML4 and XHTML, but arguably much simpler. You now can fill in the rest of your markup as needed as if it were HTML 4.01, and it’ll work in all modern browsers. That’s right, it’s OK to get started with this much right away. But if we stopped there, that would be missing the point of the new semantic conveniences of HTML5! So in the next post we will explore those constructs in a little more detail and talk about how these new constructs will save you time and make more sense for web development in the long run.