Semantics schmemantics

In the train back from Eindhoven to Amsterdam, after meeting Simon Dogger for the first time, I thought about giving up the web completely. We’ve been doing it wrong all those years, I thought. It cannot be fixed, was my conclusion. This was quite a depressing moment. What had happened?

Just before Simon had demonstrated how the web sounds to him. He had tried to do some common web tasks, like ordering groceries, transferring money and watching a video. And he had failed miserably at all of them. In part this was due to the large amounts of superfluous content that’s on all the websites that he visited. And in part it’s due to the way his screen reader explains all this content.

Semantic HTML

Screen readers don’t just read the visible content on a web page out loud to the user, they also explain the the meaning of the content. So when something is a link or a form element the screen reader will say so, so the user knows that they can interact with the element. But not only interactive elements are explained. Things that explain the structure of a page, like heading levels, navigation and lists, are explained as well. The title of this page will be pronounced as Heading level 1: Semantics Schmenantics. Léonie Watson does an excellent job at explaining why this is such a powerful feature:¹

HTML semantics are therefore important in two ways: We get a consistent understanding of content structure and native behaviour, and we get a common understanding of the content’s meaning and purpose. The best thing of all, is that we get those things for free whenever we use HTML as intended.

This is exactly what I had always assumed. Semantics can give a screen reader user an understanding of the content structure. Observing Simon left me in confusion at first. To him, hearing the semantics of elements, did not help him understand the meaning and purpose of the page. At all. The only thing it did was adding even more noise to an already cluttered page. Instead of helping him, it only confused and irritated him. What is a navigation and why does every page start with it, instead of with the content I expect to find? What does all this heading level 5, 2, 4, 3 mean and why does everybody put that in their pages?

Expert users vs laypeople

The main reason why Simon doesn’t understand semantics is simple. Like most people on this planet Simon is not a web content expert. Words like navigation, or heading level, are simply not part of his vocabulary. Instead of making web pages easier to understand, hearing this jargon over and over again make them more complicated.

Do we expect everybody who uses the web to understand semantics on an expert level? No, of course we don’t. We make sure the semantics are visually clear. You don’t need to know that a navigation is called a navigation when you see one. And you don’t need to know that a heading is called a heading and that it’s of the second level. We simply see the hierarchy. It stands out because it’s styled like a heading. We don’t need to know the word in order to understand it.

How would you pronounce headings when you read out this page? You would probably pronounce them a little bit differently than a paragraph. Maybe you add a bit of emphasis. And maybe you add a pause before and after the heading. You could consider this to be aural styling of headings.

Screen readers lack the tooling to properly style headings.

What can we do about this?

I could have tried to teach Simon a thing or two about semantics and about using more features of his screen reader. I could have changed a few settings in his screen reader for him as well, in order to make it less verbose. This might have solved a few issues for Simon personally, but it would not solve anything for all the other people like Simon.

My first reaction when I saw that semantics confuse people like Simon was to stop using semantics. And when you look at the prototype I made for him you’ll see that it consists of only a few paragraphs, links and one single button. To Simon this was a relieve. Finally a website that’s not shouting incomprehensible words at him.

I discussed this idea of completely leaving out semantics with Léonie Watson and with Bram Duvigneau. To them, both expert screen reader users, these semantics really help in getting a better understanding of a webpage, and it helps them with navigating webpages easier. Not using semantics at all would completely break the web for them.

The solution is not in the way we write our HTML, it’s in the tooling.

We need better screen readers

A quick fix would be to change the default settings of screen readers. By default, instead of all semantics, they should only speak out the behavioural semantics. Knowing that something is a link is essential, knowing that something is a heading is handy, but also needs expert knowledge that you can’t expect laypeople to have.

It’s true that you can change the level of verbosity in the settings of your screen reader. But software settings are not something that non-experts change that often. They are often quite complicated to use. They can be daunting.

If you want to tell VoiceOver on the Mac to stop reading out all headings you first would have to find and open VoiceOver Utility.

And then tab to the menu item Verbosity (and understand that this is the place to change the setting, and not *Speech*, *Web*, or *Sound*). — And then tab to the menu item *Verbosity* (and understand that this is the place to change the setting, and not *Speech*, *Web*, or *Sound*).

You will have to understand that you have to stay on the Speech panel, and then click on the button that’s labeled Additional speech verbosity options, collapsed, disclosure triangle. — You will have to understand that you have to stay on the Speech panel, and then click on the button that’s labeled *Additional speech verbosity options, collapsed, disclosure triangle*.

This will open a long table in which you can change the verbosity of heading levels. Which again is rather complicated. All in all, there are many steps, and many more possible wrong roads to take among the way. Now try to imagine what it would be like of you were trying to do all this without being able to see.

You cannot expect people who are no experts at using their computer and screen reader to understand how to change these settings.

One option would be to reconsider the way that changing settings work. Instead of a jargon filled maze, changing setting could be turned into a more user friendly interaction, like a conversational interface for instance.

Smart defaults

However you present them, I think that screen readers should reconsider their default settings. The default settings should be made for normal people who need a screen reader, and not for expert users. Normal people will get frustrated when software is needlessly complex, and give up using it. This is the opposite of what we as inclusive designers want to achieve. Simon for instance only uses his computer if he really has to, because he absolutely hates the way it works. An expert user will get frustrated as well, but chances are higher that they know how to change some settings.

Changing the default settings may help in removing complexity for laypeople, it will not solve the problem of styling heading levels. If heading levels are not spoken out as such, then there will be no difference between a heading and a paragraph. This could be very confusing as well.

Next level screen readers

There is a lot more to improve when it comes to the user experience of screen readers. It would be interesting to see what would happen when UX design universities started experimenting with the open source screen reader NVDA. What happens when indeed you change the default settings like I suggest? Would it be possible to add a little bit of intelligence? If a screen reader were a bit opinionated it could for instance ignore annoying patterns like the navigation.

Maybe a next step for screen readers would be to listen to story tellers, and people who read books to children. How do they emphasise a next chapter? What are the stylistic details they use? How do they make sure hierarchy is clear. Would it be possible to add more emotion to screen reader voices? Could you for instance translate the visual style of a web site, into something like an audible tone of voice? And could we please create a standard way of making screen readers laugh?

So screen readers should reconsider their settings. And it’s clear there is much room for innovation in this field. In the next chapter I explain that a change in design attitude could help as well.

Léonie Watson. Understanding semantics. Blog post. 2016. tink.uk/understanding-semantics/ ↩