And… farewell Blake.
The following is the last of the “Under the Hood” columns that have appeared on this blog for more than a year, courtesy of CBC.ca tech guru Blake Crosby.
Blake worked with CBC.ca for almost six years, first joining the team to babysit the servers during the Salt Lake City Olympics. He went on to work on other Olympic and elections sites, among others, and won an award for his work on CBC.ca’s Media Resource Locator tool (see his earlier column.)
Blake has moved on to a company called VerticalScope. Long term, he’s working toward a career in aviation – you can track his progress on his flying blog.
Thanks, Blake!
~PG
————–
Farewell Webtrends, Hello Hitbox!
There have been some behind the scenes changes in the way we process and crunch the web server log files.
Webtrends
The software we were using previously was called Webtrends. It processes the raw log files from the web servers and produces graphs and charts.
The main advantage to using Webtrends is the fact that it processes the raw web server logs. Anytime someone fetches content from our web servers, it is recorded in a log file. Whether this be a mobile phone, Internet Explorer, your grandma’s 386, or your text only browser - it’s all tracked.
Items such as your IP address, the page you were requesting, the type of browser you were using, and the date and time were recorded. This provided a solid source of data to process.
One of the downsides was the Webtrends limitation that the log files needed to be in chronological order. This is impossible with our website, as we have many different log file sources that are all out of order. There was a lot of overhead to merge all these log files into a chronologically correct source of data for Webtrends.
Changing Business Requirements
With the recent “upgrade” of the internet to Web 2.0, CBC needed to upgrade their website with more “Web 2.0″ features. This included items such as the most viewed stories, or most e-mailed stories. This real time data was available from the web server logs, but Webtrends couldn’t process the data fast enough for it to be useful.
This is where Hitbox, our new system, shines. The Hitbox product comes from a company called Visual Sciences (formerly WebSideStory.) It works the same way as Webtrends, except it offers real time data of people visiting the website. This is not done using log files, but javascript instead.
For every single page you visit on CBC.ca, a cookie for “a.cbc.ca” will be set. This cookie is used by Hitbox to track your movements throughout the website, and is recorded in real time. Although no identifiable information is recorded, we can see how individual users use the website.
That means content producers can track the performance of various areas of their sites in real time - understanding what stories are most popular, the times of day with heaviest usage, the most common navigation paths through the site, what links users follow to and from stories, and so on. By watching specific live stats instead of waiting for a report, the content itself can better reflect users’ actual behaviour.
|
|
7 Comments » | See also: Under the Hood |
| Email this | Posted at 11:16 am (28 Nov 2007) |







The core schedule data is still coming from Program Guide. However, with the new Radio Two landing page it’s much more efficient, as the information is only read once per day, for everybody, in any time zone.



















