Farewell Webtrends, Hello Hitbox!

And… farewell Blake.

The following is the last of the “Under the Hood” columns that have appeared on this blog for more than a year, courtesy of CBC.ca tech guru Blake Crosby.

Blake worked with CBC.ca for almost six years, first joining the team to babysit the servers during the Salt Lake City Olympics. He went on to work on other Olympic and elections sites, among others, and won an award for his work on CBC.ca’s Media Resource Locator tool (see his earlier column.)

Blake can fly!Blake has moved on to a company called VerticalScope. Long term, he’s working toward a career in aviation – you can track his progress on his flying blog.

Thanks, Blake!
~PG
————–

Farewell Webtrends, Hello Hitbox!

There have been some behind the scenes changes in the way we process and crunch the web server log files.

Webtrends
The software we were using previously was called Webtrends. It processes the raw log files from the web servers and produces graphs and charts.

The main advantage to using Webtrends is the fact that it processes the raw web server logs. Anytime someone fetches content from our web servers, it is recorded in a log file. Whether this be a mobile phone, Internet Explorer, your grandma’s 386, or your text only browser - it’s all tracked.

Items such as your IP address, the page you were requesting, the type of browser you were using, and the date and time were recorded. This provided a solid source of data to process.

One of the downsides was the Webtrends limitation that the log files needed to be in chronological order. This is impossible with our website, as we have many different log file sources that are all out of order. There was a lot of overhead to merge all these log files into a chronologically correct source of data for Webtrends.

Changing Business Requirements
HBX Analytics With the recent “upgrade” of the internet to Web 2.0, CBC needed to upgrade their website with more “Web 2.0″ features. This included items such as the most viewed stories, or most e-mailed stories. This real time data was available from the web server logs, but Webtrends couldn’t process the data fast enough for it to be useful.

This is where Hitbox, our new system, shines. The Hitbox product comes from a company called Visual Sciences (formerly WebSideStory.) It works the same way as Webtrends, except it offers real time data of people visiting the website. This is not done using log files, but javascript instead.

For every single page you visit on CBC.ca, a cookie for “a.cbc.ca” will be set. This cookie is used by Hitbox to track your movements throughout the website, and is recorded in real time. Although no identifiable information is recorded, we can see how individual users use the website.

That means content producers can track the performance of various areas of their sites in real time - understanding what stories are most popular, the times of day with heaviest usage, the most common navigation paths through the site, what links users follow to and from stories, and so on. By watching specific live stats instead of waiting for a report, the content itself can better reflect users’ actual behaviour.

Email This Post
  Under the Hood

7 Responses to “Farewell Webtrends, Hello Hitbox!”

    joe says:

    Amateur question: is Google Analytics considered inferior?



    Kev says:

    It uses the same basic tracking method as HBX, but it is inferior in a lot of areas, most notably reports.

    Inferior is maybe the wrong word though - it’s aimed at the small-to-middling and relatively simple site, which ours is not. In fact, during the transition from Webtrends to HBX, a few show sites used GA on an ad-hoc basis. But it’s better to have a single standard measure across the whole site, which HBX can do.



    Peter J. says:

    Thanks Blake! Your peeks Under the Hood (”Inside the CBC [Website]“?) have been interesting; good luck at the new job.



    Blake says:

    Peter,

    Thanks!



    Ted Tenderson says:

    a few inconsistencies here. I’ve included the quote from the story and the actual truth in the next line.

    “The main advantage to using Webtrends is the fact that it processes the raw web server logs”
    - They all do that

    “One of the downsides was the Webtrends limitation that the log files needed to be in chronological order”
    - They most certainly do not

    “With the recent “upgrade” of the internet to Web 2.0”
    - Now this statement is standalone silly.

    “It works the same way as Webtrends, except it offers real time data of people visiting the website. This is not done using log files, but javascript instead”
    - It still needs the log files. Javascript tagging pages is complimentary. Both Urchin and Webtrends do this.

    “For every single page you visit on CBC.ca, a cookie for “a.cbc.ca” will be set”
    - And if users have blocked cookies in there default browser settings? What then?

    “Hitbox to track your movements throughout the website, and is recorded in real time”
    - Real time? Really?



    Kev says:

    Not all web stats apps process server logs. HBX can, in addition to its other modes. It’s not something we use much though.

    The version of Webtrends previously in use here did, in fact, need an awful lot of log preprocessing - including merging the log files from our farm and CDN and sorting the result chronologically. Maybe they’ve sorted that since, or maybe you’ve just never used it in a large environment.

    Blake was indeed being silly with the web2.0 reference, good spotting there.

    If users block cookies, users won’t be tracked as closely. The percentage of users who reject cookies is easy to determine, and this is accounted for by the researchers using the data.

    It’s pretty close to real time, actually, on the order of seconds or at worst minutes. Note that this does not mean that we can do a CSI-esque tracking of an a single incoming user or monitor them through a hacked infinite-resolution CCTV camera, but that hit counts in reports and so forth are up to date.

    This is in stark contrast to the Webtrends situation, where you really need to throw a lot of hardware at the log processing problem to get data on anything close to a reasonable timeframe. Again, if you’ve not worked in big shops you wouldn’t have seen this, but the processing overhead was basically what killed Webtrends for the CBC. I’ve worked in one place with equivalent traffic and a slightly less complex architecture that used Webtrends, and it was lucky that they had money to burn, because they needed it.

    You’re right that the ideal stats package would process all server logs for completeness and also track user activity within pages and across sessions using javascript-based tagging, but the costs would be massive and ongoing, and even if the money was there there are better more mandate-worthy things to spend it on. The real world involves trade-offs, and HBX was the better decision over a purely log-based system because it allows for a qualitative approach in addition to just quantitative.



    Paul Gorbould says:

    I believe Radio-Canada uses a different version of Webtrends, and they seem quite happy with it. We have both that and HBX running on my site, and although the stats are comparable the user-friendly HBX interface has been a huge boon for content producers. Oh, and cut Blake some slack - he drafted this before he left, and though I gave it a quick edit it didn’t get the revisions it deserved.