Archive for May, 2007

Upskilling OSS Watch’s use of Subversion

For some years the OSS Watch team has been using Subversion. In fact, OSS Watch was the first team outside the Systems Development team (commonly called Sysdev here) to migrate to Subversion at Oxford University Computing Services (OUCS). No need to go into the reasons why OUCS switched to Subversion for version control since I’ve detailed those elsewhere. However, it is worth pointing out that OUCS was already using version control across a large set of websites, including OSS Watch’s, prior to the switch to Subversion in 2003. The change was therefore merely one in the minutia of working practices rather than philosophy. Most sites at OUCS used, and continue to use, a distinction between preview and publish. Preview is the internal or possibly development version of the site, and Publish is the live version of the site.

In those early days we were simply using the version control system supported by our host institution. And since we were so small and focused more on awareness raising than practical software development support, the way we used subversion was not that important to us.

Latterly OSS Watch’s focus has changed. In the past year we have grown significantly and now offer a substantive practical consultation opportunity for JISC-funded software development projects. As our attention turned to the practicalities of software development environments, especially open source development environments, we naturally began to reflect on our use of Subversion. Did our use of Subversion represent best practice? No. In fact, due to the way we had used Subversion over the first three years, there was no history preserved between preview and publish. There was also no easy way to simply change our working practices to use more standard Subversion commands such as svn merge or svn copy.

Of course most members of the team were perfectly capable of using Subversion differently in any software development project in which they might happened to be involved. Just how you use Subversion correctly is best explained in the freely available book (and Creative Commons licensed) Version Control with Subversion. Our questions instead were: Could we use Subversion this way within OSS Watch itself? What would justify the change-cost? And who would manage the process?

OSS Watch is nothing if not a learning team. So we sat down at one of our monthly full-day team meetings and thrashed out all the issues. Could we move from our current preview / publish set-up to the more traditional trunk, branch, tag set-up outlined in the svn-book (above)? Yes, but at some cost. After discussion with Sysdev, we concluded that the only way to make the move was effectively to start again from scratch and set it up right this time.

In the proposed new arrangement, trunk would become the new preview. Our new publish would be a branch, suitably named branch/publish. There were also a host of legacy issues with how we had been using the old preview – essentially it had become a dumping ground for content we wanted to share across the team but had no intention of putting on the website. That’s another challenge which we have safely shunted to another day by moving the old preview into branches/legacy.

So, we had a possible plan, but what would the positive benefits be that would justify this effort? I think the benefits boil down to two:

  • re-enforcing best practice in Subversion use across the OSS Watch team
  • beginning to use Subversion as a communications tool

Since we use our version control system every single day, the first benefit would be quickly realised. Our knowledge of using Subversion in much the way the svn-book recommends would become second nature. That would put every member of the team in a better position to help a JISC-funded software development project that might be wanting to use Subversion and needing some guidance. It also supports continuing professional development (CPD) for team members. It would be somewhat embarrassing to go off to a new job after having used Subversion for years while working for OSS Watch only to discover that you had never really got to grips with the fundamentals of Subversion use.

The communications point is not something that is exploited by all those who use Subversion. However, it is used as a communications tool in many software development projects. By arranging for Subversion to send a ‘commit’ email to a mailing list each time someone commits a change to a file, you can very quickly build up the ‘commit then review’ work practice. Of course we already had that practice within OSS Watch in that we would often send an email to the rest of team when we had made a substantial change to a document on the site and were looking for internal feedback. But now we could automate this. For example, this is how the Apache Forrest project uses Subversion. I have certainly found that such a use of Subversion aids participation for newcomers who can more easily see changes as they are being made. It’s a good practice that can, in software development projects, also aid the development of community. For OSS Watch it would serve to reduce the number of steps that previously had been involved in the commit then review process. But it would also serve the double purpose of supplementing our working knowledge of how best to use Subversion for community building in software development projects.

For us, the above was sufficient to warrant the change-cost. The final question was simply who was going to to manage the change? I’ve already hinted that this was a team decision. The practical steps – working up our migration plan, liaising with Sysdev, testing, support – were put in the hands of Ross Gardler and Stuart Yeates. Ross and Stuart are probably our most experienced Subversion users, at least with respect to software development projects. The migration project itself also served as an excellent opportunity for Stuart and Ross to work together on a tight, time-limited, core mini-project. So, once again, good for the team.

I’m delighted to report that the migration has gone well. The whole team is now working with the new set-up, ably supported by Ross and Stuart. I think it already qualifies as a success.

Is that the end of it? No. But it does lay the foundation for what we will try next. Soon we will start using tags to correctly indicate versions of a document that make it into support packs distributed to projects at OSS Watch events (call them releases if you like). There are even more steps planned, but slow and steady will get us there.

So why a blog post about this? Because I wanted to provide an insight into how OSS Watch grows as a team, learns new things, revises its work practices, and succeeds together. Maybe that’s something we can share with other JISC projects as well :-)

content, tagging and community on flickr

As some of you will know, I’m a big flickr user. I’ve been thinking recently about the way I use it, what works and what doesn’t work.

At the time of writing, I’d uploaded 6,380 photos to flickr over about 18 months, but 2195 photos a minute were being uploaded by other flickr members. Many of my photos are licensed under the creative commons. I have no idea how many photographs there are in total, but there are tens of millions licensed under the creative commons, so it’s fair to say that my photos represent a vanishing small proportion of them.

A number of my photos, however, have been well received and reused. My image of a marble Caduceus of Hermes in Ephesus, Turkey was reused by wired magazine for a recent article. It isn’t the best possible picture of a Cadeceus, but it’s a reasonable picture, with a clear subject, and perhaps more importantly, it’s clearly licensed under the creative commons. Being free of people also helps reuse of images, since many countries have laws about the use of images of people.
Caduceus of Hermes

Other frequently reused images of mine include a photo of a burning bull at new year’s eve in Edinburgh, Scotland; a photo of sunset behind wind turbines in Palmerston North, New Zealand; and flower fields in Keukenhof, The Netherlands. All three of these images have been viewed on flickr more than a thousand times (and flickr appears to have a stricter definition of what a view is compared to some websites). None of these photographs is likely to win an award any time soon. One has the subject unnecessarily cropped and neither straight nor centred; one is blurred by virtue of being a hand-held long exposure; one is blighted by powerlines crossing the landscape; and last has a foreground full of muddy colours.

What these images all have in common is that I’ve never met the reusers, and as far as I know, have never met or communicated with most of the thousand viewers (there is no list of viewers to browse so I’ll probably never know, but a small number may have been friends and family). If I am part of the same community as the reusers, it is only in the loosest possible sense. This isn’t a victory for the fabled web 2.0 community building.

If anything, this is a victory for tagging. Each of these photos is tagged with at least a dozen tags and the tags are largely descriptive of the image. Most are also in a small number of ontologically-related sets. These is no way of knowing how people are finding these photos, but the tags and the sets seem the likely candidates.

This is not to say that tagging and licensing is enough. I have thousands of well tagged, CC licensed, photographs which get almost no views and no obvious reuse.

I’m also not trying to suggest that community has no place on the web or in flickr. I’m benefited on several occasions from technical photographic advice in flickr communities.

It seems to me that if content is going to be reused, it needs to be good enough and easy enough to find.

An interesting side note: flickr just shut down a tool to game their ranking system.

Java and Ubuntu

A great deal of noise has been made about feisty fawn, the latest version of Ubuntu, and better support for Java and the NetBeans Java development environment. Better support for Java is a good thing, even a very good thing.

Most operating systems such as Linux, XP, Vista and MacOS are written in languages such as C and the closely related C++. These are traditional languages for writing operating systems. Since C was first written in 1972, it has come to dominate low level programming, where “low level” means close to the hardware representation of problems.

Since then there have been waves of higher level programming languages, embodying advances in software engineering and computer science to be “higher level:” to capture, represent and solve problems closer to the way the user thinks of them, rather than the von Neumann model of the underlying hardware. Java is one of these languages that has become particularly popular. Java has native representations of graphical interfaces, multi-threading, scope, object-orientation, complex data structures, non-western languages, documentation processing and much more, all missing from native C. While some of these were introduced in the later C++ and a myriad of competing libraries for both C and C++, C++s need for deep compatibility with C and lack of native support for library’s functions held them back.

Problems arise when languages such as C and Java are mixed. Java and C are like oil and water, and while there are methods of making them work together on a fine-grained scale, getting a whole system to work properly together is challenging. They use different instruction codes, linking formats, memory models, validation methods and so forth, too many differences. Other languages, such as Smalltalk and Scheme, have similar problems, their magnitude being related to the how much the language differs from C and the breadth of their offerings.

Integrating Java into Ubuntu raises a swathe of issues, some aggravated by the Sun (who invented Java) trying to retain control over the language. It’s good to see Canonical and Sun are now getting together to sort out some of these issues. Unfortunately, in this case, feisty fawn does not live up to the hype. The package for NetBeans is little more than a place holder—I still had to download the software from the NetBeans site (which I also had to do for some of the Java documentation oddly enough).

There doesn’t seem to be any deep integration either. The NetBeans internal system for downloading new packages and upgrades has no connection (that I can see) back to the Ubuntu system for new packages and upgrades. Other languages such as Perl have infrastructure in place to package new packages and upgrades for the operating systems native system, which is always going to be more robust, secure and reliable and has features such as updating running services.

Sun’s open sourcing of Java will fix these problems, I believe, once they release the whole system as open source. Enabling individuals and distributions to repackage Java for their own systems with their own tweaks will mean greatly enhanced Java integration and user experiences on those systems. On the other hand, it will probably also mean a less consistent integration, user experience and interoperability across all platforms, undermining (at least in the short term) the “write once, run anywhere” objective of the language.
Sources:
http://www.macworld.co.uk/news/index.cfm?newsid=17962
http://www.linux-watch.com/news/NS2347985166.html
http://news.zdnet.com/2100-9593_22-6177641.html
http://weblogs.java.net/blog/robogeek/archive/2007/05/some_openjdk_an.html
http://www.netbeans.org/

Create a customised Google search engine

So, you want to build a community around your project. How do you do it?

I’m afraid this post will not answer that rather large question. What it will do is point you at a useful web tool you can make available on your web site. It’s a tool that will help visitors find information related to their interests and will therefore encourage them to return to your site frequently.

This tool works because all your visitors have something in common. You need to identify what this commonality is and capitalise on it. The initial temptation would be to say that the common ground is your project, but I suggest that this is a little narrow in focus. It is true that visitors have a potential interest in your project, but that potential may not have been realised yet. I suggest that the real commonality is in the problem domain your project is interested in. For example, if your project is an XML publishing framework, your community is probably interested in topics such as XML schemas, XSLT and publishing formats. Making your site a valuable resource for people with this common interest increases the value of your project site and therefore your project brand, thus increasing the your chances of converting a visitor to a community member.

Most readers will now be thinking “yeah, that would be nice but we don’t have the resources to build such a resource.” Sure you don’t, and I’m not about to suggest you waste time building content not relevant directly to your project, but I do argue that you should encourage your users to see your project site as a hub of information about related topics. It is highly likely that there are existing sites that provide information of interest to your visitors. You need to provide access to those sites from within your own.
You could embed RSS feeds from other sites, that’s good, but you probably already do that (if not, why not?)

Another, less well known way of expanding the information base available to your users is to provide a search engine that provides more accurate results than a typical search engine? That is, provide a highly focussed search engine that uses a subset of the Internet rather than the whole Internet. The good news is that you can leverage the mighty Google search engine for just this purpose.

Enter Google Co-Op:

Google Co-op is a platform that enables you to customize the web search experience for users of both Google and your own website.

The idea is simple:

  1. identify keywords of interest to your community
  2. identify the web sites that are of most interest to your community members
  3. decide if you want to limit searches to these sites or to prioritise these sites in a whole web search
  4. link to your Google Co-Op search engine from your community site

That’s only the start of it. You can now add other people as collaborators on your custom search engine, they will be able to add new sites to search. You can embed the search engine within your website, add refinements to results pages, change the look and feel and even make money through Googles AdSense service (if your project structure allows you to do so). See the Google Co-Op site for more details.

I’m setting up such a search engine for OSS Watch, it’s not embedded in our site yet, but may be in the near future. If you have a favorite OSS resource we should include in our CO-OP search engine please let us know.

The Digg Revolt or That’s My Number You’re Posting

I never expected to see a reference to Digg on the BBC News’ front page. Although both have esteemed positions among my rss feeds, somehow I just couldn’t see their spheres overlapping. Digg, if you haven’t heard of it, is a technology news site that operates on a collaborative filtering principle. Web links are submitted by users, and other users vote for (or ‘digg’, to adopt the site’s cyber-hipster dialect) the links that they find interesting. Digg’s front page is a sociological snapshot of the preoccupations of geekdom. Stories on technological issues like Digital Rights Management, open source software and console gaming are side by side with links to political blogs, movie trailers, Youtube rips of last night’s Colbert Report and paparazzi shots of starlets getting out of cars with low suspension.

Being the kind of site it is, Digg has repeatedly covered the attempts by amateur technologists to circumvent the copy protection on the next generation of video discs. Sony’s Blu-Ray and Toshiba’s HD-DVD formats both use complex encryption techniques to try to prevent consumers from duplicating their contents. Feelings run high on both sides of this issue. Many argue that it is unreasonable of content providers to prevent their customers from making backups of the movies that they buy. Content providers point to the widespread copyright violation on the internet as the reason that they have to take aggressive steps to protect their products from illegal duplication. In the case of DVDs, the encryption employed (dubbed CSS or Content Scrambling System) was defeated rapidly, and this has prompted the copyright-supported industries to spend a lot of effort and expense on its replacement – AACS or Advanced Access Content System.

Media copy protection as an endeavour has one major problem – a truly secure scheme would have to prevent the consumer from watching the content. As consumers were not likely to tolerate this totally secure scheme, a compromise had to be found. Thus the means to decrypt (and watch or copy) the content are provided to the consumer on the disc, but they are hidden. Having learned from the failure of CSS – in which a software player was hacked to reveal its decryption key – the content providers have made AACS’ method of hiding the key extremely complex. Keys from the player’s software, the player’s disc drive and the disc itself are needed to play the content. Despite these complications, immense effort has been deployed by amateur technologists to find the keys, and earlier than many expected, cracks in AACS have begun to emerge. Until very recently these efforts had been ignored by the AACS-LA, the trade body that administers this copy protection technology. They always expected some keys to be compromised, and had built a key revocation system into AACS to deal with this. If someone hacked your model of player to gain its device keys, the AACS-LA would change the encryption on all future discs to make them unplayable on that model. As a consumer you would have to upgrade the player’s firmware yourself, have your retailer do it or be content to only watch old discs. Obviously this would annoy a lot of people but better that – the content providers reasoned – than have the entire format compromised as happened with DVD.

However, in February 2007 it emerged that AACS had been compromised further. A hacked HD-DVD player had been used to successfully discover a ‘processing’ key. Although this key could be rendered invalid through revocation – just like the device keys – the hacked player could always be used to generate a new processing key from newly released discs. If the AACS-LA wanted to plug this hole in their copy protection scheme, they would have to prevent the dissemination of the newly-discovered processing key and all of its successors.

On May 1st, the AACS-LA started issuing takedown notices under Title One of the DMCA (Digital Millennium Copyright Act) to everyone listed in Google as having posted the key. If your site featured those sixteen hex pairs, the AACS-LA argued, you were abetting the circumvention of a ‘technological protection method’ as defined within the DMCA. Worse still, the so-called ’safe harbors’ introduced within the DMCA to protect web sites and ISPs from copyright infringement by their users did not apply to the posting of these numbers. The AACS-LA were not claiming ownership of copyright in the numbers – it is likely that there is no copyright in them to be owned.

Digg had a story on the discovery of the processing key dating back to February. When they received a takedown notice from the AACS-LA requesting that they remove the story – which included the key itself – they complied. If they had failed to do so, they would have risked a civil action alleging contributory liability for copyright infringement. Unfortunately for Digg’s administrators, some of their users noticed that the story had disappeared. Due to the mass-moderated nature of the site and the prickliness of its users on the subject of censorship, a new story featuring the key was soon surfing to the top of the front page. Digg’s admins deleted that one too. Another emerged. That too was deleted, and the user who posted it was banned…

I was watching this happen just before going to bed on the night of the May 1st. Digg’s admins had just put up a blog post explaining that they were a small team and that a civil action from the AACS-LA could destroy them. Stories featuring the processing key kept appearing then getting deleted. When I got up the next morning, every story on the front page featured the key and most of them a good few expletives. Checking the Digg blog I saw that the admins had given up.

On reflection I think that the BBC were right to give the story coverage. While it’s easy to caricature it as a cadre of geeks throwing a hissy fit, in fact it encapsulates a large issue around the emerging framework of internet reporting, and the legal frameworks that surround it. The incredible success of Digg’s collaborative filtering model of content discovery is being aped everywhere, from MySpace to Netscape.com to BBC News itself (with its ‘Most Emailed/Most Read’ boxes). Implicit in this model is the recognition that however good your editors are, your users are really the experts on what they themselves want to read. Digg’s owners make large amounts of advertising revenue for essentially just providing a sexy AJAX framework for users to interact, all under an attractive brand. Why is the brand attractive? Because it is defined by those it attracts – in a never-ending iterative process. Digg’s users have made the Digg brand stand for (among other things) rabid opposition to DRM. While legal action from the copyright-supported industries could destroy a site like Digg, alienating its users most definitely will.