Visualising Open Source Projects

Our colleagues over at CETIS have been doing a lot of work on analytics recently so I thought it was about time I had a go at using some of the tools available for analysing open source software projects.

I’ve used Ohloh for quite a while, which is a useful tool for getting a general picture of a project and its committers, particularly its canned “factoids” relating to the overall level of activity, amount of comments in the code, languages used and so on. At OSS Watch we also have some more in-depth tools and methods for looking into how a project is working. However, I was interested in how you might be able to visualise the activity of the community itself; the first thing I found is a very nice application called Gource.

Gource reads the log files from most source & version control systems used by projects, including svn, git and mercurial, and generates an animation showing the interactions between the people and the codebase – so every commit is visualised as a person moving around the code structure and touching the files.

You can also export the Gource output into FFMpeg to create a movie. Here’s one I made showing the history of Apache Wookie – from project startup all the way through incubation at the Apache Software Foundation – in about 48 seconds:

YouTube Preview Image

The actual command used to generate the movie looks like this:

gource -640x310 wookie.log -t 48 -s 0.01 -hide filenames,dirnames,usernames -o - | ffmpeg -y -r 60 -f image2pipe -vcodec ppm -i - -vcodec libx264 -preset ultrafast -crf 1 -threads 0 -bf 0 wookie.mp4

The movie shows the project starting off with one contributor laying the groundwork before being joined by one or two others, and finally towards the end there are parts where you have four or five people working together, some focussed on one part of the project, and some on another.

You can also make the movie much slower, and show the usernames of the committers and the files and directories being worked on to get a better sense of “who is doing what” in the evolution of the project.

Unfortunately, what we see here is only part of the story, as these are all the commits made to the source code, but not the messages on the mailing list, the issues on the tracker, and the updates to the websites. For Wookie, for example, much of the growth of the community is seen in contributions on the issue tracker and mailing list – only a few made the transition into becoming committers.

It would be great to get the issue tracker history from Jira, for example, and include that in some fashion. With Apache projects at least there is also often an explicit link between a Jira ticket and related commits. Given that Martin Hawksey has shown you can use Gource for visualising OER deposits into Jorum using Google Refine I’m sure its possible!

I also made a movie of Apache Rave using the same settings; Rave is a much younger project but started out with far more committers, so the same activity here takes around 20 seconds:

YouTube Preview Image

Finally, here is another video from the Gource team showing a few different projects, including Moodle:

YouTube Preview Image