This was a well attended workshop in the parallel sessions, and the attendance wasn’t because it was close to the previous coffee break, because it was well hidden in the maze-like Vanbrugh building. This is my report, obvious factual errors are most likely to me mishearing, mistyping, mis understanding. If you spot any errors let me know.
Paul Kelly and William Mackintosh described the process the The University of York went through, looking at alternatives to their current combination of analog and report magic. The current system is stable, has good report writing functionality and is free, but is a “one size fits all” solution giving all units the same reports, is poorly understood by staff and there are doubts about the effort required to maintain the system in the medium to long term.
In selecting alternative solutions to explore both open source and propriety systems were examined. Open source systems were seen as connected with concerns about overhead and maintenance. Two propriety systems, NetStat and Google analytics were selected for more in-depth study.
Both systems used javascript and cookies to report user activity to third party servers, rather than analysis of webserver logs. The appropriate magic for both systems to a range of webservers under their control and compared their performance, both to each other and to the previous system.
The first and most obvious result was a huge result in the number of “hits.” Both systems only count systems that run the javascript on web pages and allow third party cookies, thus excluding: users with conservative cookies settings, users with javascript turned off, users using browsers that don’t support these javascript or cookies (or both) and web crawlers. Examination of the (existing) logs suggested that the majority of the drop was attributable to web crawlers. The impact of web crawlers was increased by the use of an institutional google search appliance.
The old system was not turned off, because there was seen to be a need to continue to generate the old metrics for on-going reporting.
Both systems had path analysis tools, but they could only be used to answer questions (or hypotheses) already posed. For example, in a debate whether units of enrollment should be called “courses” or “programs,” both could be tried and the systems used to determine whether users found the appropriate material without backtracking. It was seen to be important that only one change be made at once. Google calls this “goal analysis.”
The Nedstat representative present mentioned that this idea was taken to the extreme by certain commercial modules such as Mercado and TouchClarity which integrate the statistics engine and CMS to allow dynamic alteration of many page features to optimize for particular goals. Typical alterations might the changing the size, colour and placement of buttons and links.
The javascript and cookies approach had difficulties with non-javascriptable content (i.e. PDFs, word processor files, etc). Local references could have the necessary javascript add to the link, but this doesn’t work for non-local references, because the webmaster is not in control of the link. [maybe mod-rewrite could be used to redirect users to such javascript ?]
Privacy was raised and it was mentioned that google analyitics has a requirement that sites using it have a privacy policy
We went around the room introducing ourselves and outlining what systems we used. Here’s a quick tally of the systems. I’ve counted pretty much every system mentioned in a non-technically-disparaging manner, while attempting to avoid duplication where two people were from the same institution. In the order in which they were first mentioned:
Nedstat 8
Analog 12
Google analytics 15
Evisit Analyst 1
WebTrends 3
Webaliser 1
Site Meter 1
AWstats 2
Netinsight 1
Log Anayser Pro 1
Paul and William would not be drawn on which system York was likely to go with, but seemed confident they would go with one or the other.
This is a good summary of the informative session provided by Paul and William. Just to point out a couple of things with regards to Nedstat – our cookies are accepted by default browser settings and medium browser security settings accept them without issue. We benchmark the Education Sector and have found that the average percentage of users with Javascript turned on is around the 97% mark. Nedstat does filter out bot/spider traffic automatically so there can often be a perception of traffic decrease when you switch from a log-file system that does not filter these. The reality is that we only capture real user behaviour which is much more valuable in the long term.
Thought I would just point out that the Nedstat code has a non javascript section and javascript section so we can track users who do not have javascript enabled. With the non javascript section we can still get cookie information, user agent details and ip information. Also if the user does not accept third party cookies we can track the user with only their ip and user agent details.