An introduction to the semantic web

#WebDev #MetaData #Web3 #SemanticWeb

Published on 9 November 2023 by Andrew Owen (5 minutes)

One of my predictions for 2023 was that there would be a lot more talk about Web 3.0. I couldn’t have been more wrong. Global events and the rise of AI have completely overshadowed web developments. But it’s still a topic worth some consideration.

The term Web 2.0 was coined by Darcy DiNucci in 1999. Web 1.0 was subsequently invented as a term to describe the earlier period. There is no fixed delineation between the two eras. The first is generally thought to have lasted from 1989 to 2004 and featured mainly static content. The second is thought to start when social media profiles replaced personal web pages.

But I’d draw a different distinction. I think of Web 1.0 as everything before HTML5 (or the Flash era). When the iPhone was announced in January 2007, initially it wasn’t supposed to run native apps, except for the basic set included on the device. It was supposed to run web apps (written with Ajax). HTML5 launched the next year.

Web 3.0 means different things to different people. It’s sometimes used as an alternative term for Web3, which is an idea for a version of the web featuring decentralization, blockchain and token-based economics. It’s where non-fungible tokens (NFTs) came from. But it’s not well-defined or widely adopted.

In his 2000 book “Weaving the Web”, Tim Berners-Lee described a vision where computers:

“…become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize.”

The semantic web is also sometimes known as Web 3.0. But Berners-Lee wasn’t the first to have this vision. Arguably, it started with Ted Nelson and Project Xanadu in 1960. And the ideas that influence it can be traced back even earlier to Vanneavar Bush’s 1945 article “As We May Think”. For a deeper dive I recommend watching Douglas Adams’s 1990 documentary “Hyperland” which predates the World Wide Web and the first web browser.

Hyperland does a remarkable job of predicting the modern internet, developments in virtual reality and software agents like Siri (although Tom Baker’s agent is rather more configurable). But I’m in the minority camp that thinks that Xanadu would have been better than what we have now. Its original rules stated that every document can:

Consist of any number of parts, each of which may be of any data type.
Contain links of any type, including virtual copies to any other document in the system accessible to its owner.
Contain a royalty mechanism at any desired degree of granularity to ensure payment on any portion accessed, including virtual copies of all or part of the document.
Have secure access controls.
Be rapidly searched, stored and retrieved without user knowledge of where it is physically stored.

Every server, user, document and auditable transaction would be uniquely and securely identified. Documents would automatically be moved to physical storage appropriate to frequency of access from any given location. Documents would automatically be stored redundantly to maintain availability even in case of a disaster. Blockchain will have a role to play if we ever get there.

That leaves us with the semantic web, which exists now. But what are semantics? The term is derived from the earlier semiotics (the interpretation of signs and symbols). It can mean the study and classification of changes in significance of words, or a branch of semiotics to do with relations between signs and what they refer to. But in web terms, we’d probably just call it metadata.

One way of adding metadata to web pages is using the Open Graph Protocol. It was originally developed by Facebook (Meta) for use with its Social Graph mapping and tracking tool. Meta uses it to enable any web page to have the same functionality as any other object on Facebook. But other social networks also use it. The basic metadata includes:

og:title Title of the object as it should appear within the graph. Example: “The Rock”.
og:type Type of object. Example: “video.movie”.
og:image Image URL that should represent your object within the graph.
og:url Canonical URL of your object that will be used as its permanent ID in the graph. Example: “https://www.imdb.com/title/tt0117500/”.

On this site with Hugo, I include these tags in head.html partial:

<meta property="og:title" content="{{.Title}}" />
<meta property="og:type" content="article" />
<meta property="og:image" content="{{.Params.Image | absURL}}" />

This means that when you click one of the social share buttons, it should pick up the correct image. Before I did this, it would default to the background that goes behind the header.

Tags have long been used for search engine optimization (SEO). Here are some commonly recommended tags to include in the <head> tag on an HTML page:

<title>A clickbait title</title>
<link rel="canonical" href="https://example.com/">
<meta name="description" content="A description of the content." />
<meta name="author" content="Your Name" />
<meta name="viewport" content="width=device-width, initial-scale=1" />

The viewport tag ensures the browser window is an appropriate size for the screen of the device. You should also include <meta name="robots" content="noindex"> on pages that you don’t want to be indexed, such as error pages. And you should always include the alt attribute on images. But you can go much deeper and add microdata to your content with schema.org.

To conclude, while we’re waiting for Web 3.0 to arrive, it’s a good idea to start using metadata. If nothing else, it will be useful to our future robot overlords.