|
WEB 3.0 - Darwin's Revenge or The Next Big Thing?
Miles Faulkner
March, 2007
Last November there was a very interesting article about new tools
and technologies that will drive Web 3.0 by John
Markoff of the New York Times. Apart from being of general interest,
this matters because there are claims that within these new technologies
there is a “Google Killing" application. Initial reaction
across blogs was fairly negative with an eclectic but interesting set
of observations.
So, rather than resort to a quick and merciless blog entry, I thought
it more interesting to dig deeper and get past the “label veneer.”
This ended up being an odyssey of link- following, and generally learning
more about some obscure and complicated new technologies ranging from
OWL to ONTOLOGY
to SPARQL. In summary, they are technologies that
provide a foundation for the categorization, interaction, and delivery
of information available via the web.
• CONTEXT
• THE 3R HYPOTHESIS, A GENUINE UPGRADE?
• CHALLENGES
• WEB 3.0 - JUST NOT THIS ONE!
• EXISTING TECHNOLOGIES
• RECOMMENDATIONS
• WRAP UP
• DEFINITIONS
CONTEXT
There is currently a lot of discussion about the wave of open source
computing, peering, and how the world is “flat”. Arguably,
the search engine space is a major arbiter of “flatness.”
Contrary to this “flatness” theory however is the fact that
there is an exponential growth of data making the process of finding
the right answers problematic. When everyone contributes – that’s
a lot data!
The semantic web, or Web 3.0, by definition suggests the World Wide
Web is a jungle that needs to be tamed and sorted into a more organized
garden. A better analogy may be that the web provides instantaneous
access to anything in the world; but, as with mining for diamonds, if
you don’t have the tools or the right location you won’t
find what you're looking for.
To combat this issue, there are a number of start-up companies with
very decent funding that are hyping Web 3.0 technologies. Their emphasis
is on W3C
standards that will potentially make content on the web more “meaningful”.
Meaningful in this sense means that the words and content on any particular
site are tagged and linked to a contextual model (ontology)
that determines their relationship and meaning within a particular "class."
FOAF (Friend
of a Friend) is an example of a popular new way to represent information
about people and information. It’s an open source project that
uses RDF/OWL tools to enable better
identification and linkages between people. This is a well explained
and practical example of the thinking in the Web 3.0 space. Because
FOAF is by its nature structured data- using the right query technique
will yield the exact result – so it’s closer to a database
style query. So if web site developers adopt FOAF tools, every time
a person is referenced then they would be adopting a standard that structured
queries can readily access.

So when such sites are interrogated by Artificial Intelligence based
search engines or just plain old “machines” - powerful new
services can be created. The proponents of this next big thing (NBT)
go so far as to suggest that artificial intelligence engines will then
be able to start learning at an exponential rate because they will have
a structured web as their playing ground.
Nova
Spivack a web entrepreneur and leading light of the Web 3.0
movement speaks eloquently of this utopian learning model and
its implications. His background includes working for Raymond
Kurzweil, whose idea of a Technological Singularity is well
worth a diversionary read. The key theory behind this - the Law
of Accelerating Changes (an extension to Moore’s Law) is
very thought provoking (I recommend you check it out).
The common thread between Nova Spivack's thinking could be that
when a system can learn by itself from the WWW knowledge base
this becomes one of the fundamental paradigm shifts along the
road to the major singularity that Kurzweil discusses. However
my feeling is that this may well suffer from being MTBT (Much
Too Big a Thing) – for anyone to really buy into. |
THE 3R HYPOTHESIS, A GENUINE UPGRADE?
Thus we have the following landscape evolving – it’s fairly
clear that Web 1.0 was mostly about one-way content publishing; the
“read" web. Web 2.0 is now mostly universally agreed as being
a phase of offerings that allow participative content/expression sharing;
the “read and write” web.
Web
3.0, or the semantic web, seems to be about the third “R”,
a form of grown up reasoning capability emanating from the
integration of smart questioning machines and well formatted data. Note
that I use the term reasoning as opposed to meaning because the evidence
to date suggests that interpretation of content by artificial applications
currently IS a form of reasoning; ie an interpretation of what is being
scanned/indexed/viewed.
We could call this the 3R Hypothesis - a kind of “grown up”
or more matured web. It should be recognized that there is a great deal
of very interesting work going on in this space as well as venture capital
funding. Some of the companies are quite secretive – examples
include Radar
Networks, ZoomInfo
and Metaweb Technologies.
CHALLENGES WITH THIS HYPOTHESIS
Even so, for the 3R Hypothesis to come to pass, there are a number
of dependencies and problems that must be overcome.
The wild web will need to play “ball” with proscribed
standards. Prescriptive implementation is required. Most
blog critics around this topic believe that the web really is wild
and that common content/tagging standards will never come to pass
(ie RDF and SPARQL like adoption). I agree with this, but since a
lot of the current work is focused on text interpretation, this is
a potentially spurious argument. If you can read – why worry
about tags?
Data will need to be available and accurate enough to enable
the new technologies to answer the questions it's asked to answer.
Open access will be required. Today most sites want to be
found by search engines – arguably this is not going to be true
in the future. They may want to be found but keep their true content
to themselves to avoid aggregators stealing their advertising $. As
will be argued later, everyone is into peer technologies today –
but that may change – tomorrow may be different.
AI software must be capable of delivering the goods. The
web is no different than systems in general and unfortunately systems
actually aren’t all that smart. The best AI applications are
very niche in nature (chess –risk - weather) and even then they
struggle to solve complex problems. This is compounded by the data
itself. Who decides what content is true and accurate - in a world
where content is created by anyone with any purpose? There is a long
way to go before accuracy of a fact can be determined by the number
of instances of a statement on different web pages.
Other innovative drivers won’t drown out this vision
of the future. It’s interesting to think about what
“it is that is being searched. A novel perspective on this could
be a world where anything of importance has an IP address (because
network connectivity is ubiquitous and power is always on). Network
presence or physical location becomes more tangible than data as everything
becomes network aware. Entities that have a network address will store,
selectively publish and validate information about themselves.
A likely example would be automobiles that reveal their location
and status to other systems. FOAF is in a way a forerunner of this
concept. In the future state one has to believe that entirely new
modes of search, discovery and meaning are developed around more tangible
entities. The web page is merely the gateway to internal more secure
systems for physical entities.
WEB 3.0 WILL COME – JUST
NOT THIS ONE!
Based on this line of reasoning it’s very unclear that the 3R
version of the Web is the next major milestone. This doesn’t mean
that it isn't valuable thinking through the implications, nor examining
what the evolution of the web might mean in practical terms for businesses
and individuals. We shouldn’t, in fact, overlook the obvious –
ie that search is still the hottest category on line.
EXISTING TECHNOLOGIES, GOOD
OL' SEARCH IS STILL IMPORTANT
Regardless of the next phase of the web, search engines still play
a dramatic role in our digital activities. According to Comscore
Media Metrix (FYI this is a fantastic service for understanding
real traffic patterns on line).
• 94% of all on line Canadians in December 2006 went to a
search engine.
• 84% of Canadians visited Google.
• CMM shows some 294 search engine sites that register in their
research.
This is a lot of competition. Note that they all have their own indexes,
databases and value propositions. It’s important, therefore, not
to miss the forest for the trees. Rather than looking to the Next Big
Thing (NBT), look to Darwin and the survival of the fittest. Web businesses
have succeeded first and foremost by capturing visitors. Watching who
wins this battle may be the best way to spot early paradigm shifts in
technology.
RECOMMENDATIONS
A number of critical points are therefore worth considering for channel
owners or serious web site operators.
The numbers don’t lie. SEO should be driving
at least 50% of your thinking around traffic generation and channel
growth. Even more importantly Search Engines have the data on what
people want to find, what they know and what they buy and thus may
become the ultimate research companies in the future (see for example
Google
Trends).
Google has nowhere else to go. It has saturated
the search market in terms of traffic. Look at other search engines
who are all working on their version of the NBT. Yes Google has a
huge share, but 30% of the online population do visit other SE’s.
Google’s other problem is that more accurate search punishes
its bread and butter business – laser sharp results would mean
less click through fees from advertisements. So get to know the others
better.
Text and content probably will start to matter.
The 3R style web and aggressive search engines could start to adopt
agents that understand sentences with “subject – verb
/ object” combinations. Being descriptive matters. It might
help people better understand your value proposition anyway!
Consider more carefully what to publish in front of the firewall.
If your model relies on traffic to your site others may monetize your
content before you, using better Search technology. Worse, you may
find yourself in poor company as part of typical search results. Aggregators
are getting very clever. See Yahoo’s latest technical tour de
force, Pipes
(this is pretty cool FYI).
Consider RIA (Rich Internet Applications). The newer
more powerful development languages contain more structured tagging
for content which makes them inherently better suited for SEO (think
RDF/OWL – see acronym table). Not only will you be delivering
more interactive powerful on line experiences, the content will be
made more relevant to SE’s.
WRAP
It is inevitable that the web as it grows older will go through major
gyrations of what is “hot,” ie the NBT syndrome. The 3R
stage is probably overly simplistic. It’s easy to see how some
could think that all the world’s knowledge is simply a hyperlink
away and thus attainable for commercial or intellectual advantage. But
in fact this could be like the space program – well intentioned
with many beneficial side technologies developed along the way.
It’s clear that already social participation and rich media (YouTube)
have created another dimension of content that needs to be searched.
It is quite likely that new rich media and, as earlier discussed, physical
entities themselves will be invented or added that create yet another
indexing challenge. Thus content types may change faster than search
engine technologies can be developed to sort through them.
Consider something else –what if today open source and “peering”
is in essence really a powerful meme (think social
trend). Blogging is a huge trend in self expression – but how
long will it last? The web has been described by some as the ultimate
“meme vector.” It is also a meme accelerator. If it’s
a good idea it really gets around fast. This means that the current
“participation” meme will change or evolve as with all trends.
What if tomorrow communities close up shop and withdraw support for
“openness” and a new tightly knit meme takes over?
Broad based search models could be in for difficult times; and the
Tipping Point could in fact tip the other way. The point about really
keeping a close eye on the numbers and where WWW traffic is going becomes
critical in this context; it will tell us of such changes in advance.
Finally it’s important not to confuse technology trends with
memes, social trends, or simply commercial self
interest. I believe that overly grandiose; indeed revolutionary labels
such as “Wikinomics” and the “World is Flat”
could be describing OURSELVES, not so much as the underlying enablers
of the web. There is nothing wrong with this so long as it’s recognized
for what it is. One danger is that perfectly valuable technical ideas
and tools do not get funding because they are out of step with any given
trend today.
Open source software has the benefit of many minds contributing to
it, but it doesn't necessarily create the most advanced or the most
stable technology from a software patform perspective. Yes, peer contributions
can play a very important role in the ulitmate development of more advanced
technology but this is a function of how much the contributors choose
to contribute.
So, it's more likely that Web 3.0 will be turn out to be a label based
on people and how they want to interact than about the latest
technical functional developments. Since we have a very hard time figuring
ourselves out knowing what the next upgrade will be could be tricky.
One thing is for certain – when it changes you’ll see it
first on the web!
Miles Faulkner
Principal, Faulkner Consulting
Toronto
ONE MORE ASIDE
A funny and slightly dark aside was recently made by the principal
academic behind “KnowItAll” a program at the University
of Washington that has an online service called TextRunner
(yes, knowing it all does seem to be on their long term agenda!). Oren
Etzioni commented that his Textrunner application has “Attention
Deficit Syndrome” in that it often got caught up in the random
nonsense of single “facts” in the on-line universe and that
he needed to teach it “focus”. Sounds just a bit like a
science fiction plot – don’t you think?
| DEFINITIONS MEME
- (http://en.wikipedia.org/wiki/Meme)
Refers to a unit of cultural information transferable from one
mind to another. Examples of memes are tunes, catch-phrases, beliefs,
clothes fashions, ways of making pots or of building arches.
A meme propagates itself as a unit of cultural evolution and
diffusion — analogous in many ways to the behavior of the
gene (the unit of genetic information). Often memes propagate
as more-or-less integrated cooperative sets or groups, referred
to as memeplexes or meme-complexes. In this context open source
collaboration is surely a typical example of a meme – a
belief in sharing.
OWL / Web Ontology Language
OWL
is a prescriptive language used in the markup of web content.
It defines the concepts, attributes and relationships that exist
within a particular group or community. For instance, the word
drive is defined differently depending in which class or community
is using it, the word has a clearly different meaning when used
in reference to automobiles, computers, hockey or psychology
RDF / Resource Description Framework
A family of World Wide Web Consortium (W3C)
specifications which has come to be used as a general method of
modeling information, through a variety of syntax formats.
The RDF
model is based upon the idea of making statements about resources
in the form of subject-predicate-object expressions. The subject
denotes the resource, and the predicate denotes traits or aspects
of the resource and expresses a relationship between the subject
and the object.
For instance, one way to represent the notion "The sky has
the color blue" in RDF is as a triple of specially formatted
strings: a subject denoting "the sky", a predicate denoting
"has the color", and an object denoting "blue".
SPARQL / SPARQL Protocol and RDF
Query Language
Modeled loosely after SQL, the query language SPARQL
is emerging as the de-facto RDF query language. On the track towards
status of W3C Recommendation, it was released as a Candidate Recommendation
in April 2006.
SEO / Search Engine Optimization
The process of structuring web content in such a way that search
engines find, index and return content in a rational manner. |
|