Traditional web analytics makes several assumptions about the user’s intent. Gary Angel, President of Semphonic, looks at new set of opportunities for effectively measuring, analysing, and optimising your digital properties to drive better online performance.

A New Method for Digital Analytics
Surprisingly little actual analysis seems to get done in the world of Web analytics. A huge library of common statistical methods exists for analysing data: from simple methods like correlation to more complex methods like linear and non-linear regression or cluster analysis. Why, then, aren’t these techniques routinely employed in Web analytics the way they are in most other forms of marketing (and non-marketing) data analysis?
As Gary Angel, President of digital measurement and data analytics consulting firm Semphonic, explains, the topographic nature of websites prevents these basic statistical techniques from working well in the digital realm. He identifies several methods for solving the unique problems that digital data and website structure present, and highlights a method that can effectively enable statistical analysis of digital data by controlling the effects of topography. These techniques provide a completely new set of opportunities for effectively measuring, analysing, and optimising your digital properties to drive better online performance.
The Problem with Statistical Analysis in Web Analytics
When we do Digital analytics, the essential behavior we see is a trail of where a visitor went in the virtual space of a single site, and we assume that when visitors navigate to a place, that they did so with “intention.” In other words, that the content visitors view is an accurate reflection of their interests.
This assumption that what a visitor chose to view is reflective of their interests and intentions seems like a fairly safe bet. However, the way visitors traverse a website is controlled, to some extent, by the options and paths you provide. A website has a structure so that the visitor is encouraged to travel on certain link paths to reach a destination. Indeed, some paths may not be available to a visitor at all. So a Website can sometimes be like a magician’s trick – the card you pick was forced on you and isn’t what you meant to choose at all. There is a fundamental tension between these two basic principles; meaning that if we don’t take account of the structure of a website when we examine behavior, we are highly likely to misread intention.

In a statistical analysis of traffic in San Francisco, HWY 1 would be strongly correlated with reaching the Golden Gate Bridge. In Web terms, we might assume that if getting to the Golden Gate Bridge is our success metric, then HWY 1 is a BIG contributor. And, of course, it is. In a totally meaningless fashion. There are only two ways to get to the Bridge and HWY 1 is one of the two. So it wasn’t strongly correlated, we’d be very, very surprised.
No analyst would ever be foolish enough the think that a straightforward correlation model would work for analyzing city traffic. Surprisingly many have made exactly that same mistake when it comes to Websites.
Surprising, because Websites are very much like city streets. Some pathways are big and broad. Others small and narrow. And sometimes there’s no direct way to get from Point A to Point B.
Basic statistical analysis techniques aren’t designed to handle data sets where the data is topographically arranged – and the structure of websites creates a deep topology to web data.
Simple correlation analysis, for example, does nothing to separate out the impact of natural structure and visitor intention. So pages that are closely related navigationally are almost always highly correlated. This makes it impossible to interpret true intention and, therefore, almost completely useless.
Creating a Topographical Analysis
So any real analysis of visitor behavior will have to take account of topology before it will be possible to infer correlation or intentionality.
From a heuristic standpoint, we think of websites as having a hierarchical structure. At the top is the Home Page followed by Section or Main Menu pages and underneath each of these pages lives additional content often with further hierarchical nestings.
While clearly valuable, this type of abstract hierarchical ordering isn’t perfect. It doesn’t provide a clear representation of the distance between two points nor does it capture many of the intricacies of website structure. Key content, for example, is often directly available from the Home Page but may be “structured” as several layers deep within the website.
Nevertheless, many UI designs begin with a hierarchical structure diagram of the website and this type of representation is a good place to start thinking about capturing a topography in digital form:

With this type of representation, we could map the distance between any two pages as the number of boxes along the hierarchy that have to be traversed to reach them. By instantiating an abstract digital hierarchy like this, the analyst can create a “design” view of the distance between any two points on the website.
For a complex website, building this abstract website structure is a lot of work, especially if a pre-existing design topology hasn’t been constructed as part of the UI design.
Still, as a digital representation of “design” view of the website it can be an invaluable asset to analysis.
Developing a Behavioral Topography
Fortunately, you don’t have to invent a topology for a Website. Using an algorithmic approach to analysis, it’s possible to create a behavioral topology of the website.
The behavioral map works by creating a topography of the website based on user “previous page” steps. To begin with, the analysis identifies all top-level pages – one where a majority of its views are classified as “entries”
Next, any webpage which most common “previous page” is one of the top-level pages becomes a 2nd level page with its parent node being the appropriate top-level page. This process continues until every page on the website has been classified in the tree.
This behavioral topography is incredibly rich for analytic purposes. It can provide a deep measure of the difference between your designers view of the site and actual usage, and to provide a topographic distance between points on the website to measure correlation more appropriately. You can also use it to measure relationships between branches of the tree and places where jumps across trees are common or rare.
Topography as both a Method and Solution
A topographic approach provides unique algorithmic website analytics that are fundamentally different than those you could hope to create with more traditional statistical methods such as correlation or regression.
However, one of the virtues of a topographic analysis is that, by establishing both a logical and behavioral distance between points, it provides a method of controlling for distance between two points. With distance available explicitly in a variable, it’s much easier for the analyst to incorporate it as a part of a standard statistical analysis.
With a direct incorporation of distance from the topographic mapping, there is no subjective assessment involved and the distance measurement is rigorous. Correlation to success between pages that are equidistant in the topology is MUCH more meaningful because you’ve controlled for the most important influence of Website structure.
Summary
Web analytics data isn’t directly analysable with straightforward statistical techniques because websites aren’t simply open collections of randomly accessible content. Pages on a website are not equidistant and because websites embody a structure, it’s impossible to analyse behaviour interestingly unless you’ve first accounted for structure.
A topographical design model creates a logical model of the site (rather like a sitemap) and then counts distances between nodes in the hierarchy. Even better, a behavioral topology model can be built showing how users actually navigate the structure and from this model, distance between nodes can be calculated based either on the distance in the tree or the actual number of average clicks between points.
These topographic models create numerous new analytic opportunities that few Web analysts have explored. They form a distinct set of analysis techniques that are quite different from traditional marketing analytics. Interestingly, however, they also open up the opportunity to use classic statistical analysis techniques more fully. By creating objective measures of distance and a true topography of the website, these models make it possible to look at the relationship between content and outcome on the website while controlling for the site’s inherent structure.
By Gary Angel
President
Semphonic
www.semphonic.com
