Archive for August, 2007

Monitoring the buzz

At OSS Watch we like to know who how people react to our events, our documents and our blog. Sometimes this feedback comes to us directly, via our personal email accounts, our catch-all info@oss-watch.ac.uk account, our mailing lists or even by phone.

Sometimes, people take about us on their own mailing lists, blogs and websites. Initially we kept track of such feedback by checking inbound links reported by Google. This, however, has become less and less reliable, partly because Google now have separate indexes for much of the web that doesn’t show up in most searches. The total number of inbound links to the OSS Watch website, as reported by Google, seems to be static, even over periods when we know that new links were added, so this number appears unreliable.

Currently we keep track of such feedback using Google Alerts These regularly run a given search across a number of the Google indexes and email you of any new hits. Any Google query string can be used. I have alerts, for example, for “Stuart Yeates” and “OSS Watch,” but they can get very complex. I recommend using as tight a query as possible over as wide a range of media as possible.

Personally I find Google Alerts strangely reminiscent of alerting services in the bibliographic world, which have been running customised queries against newly published journal articles for at least ten years.

As mentioned recently on the mailing list, we’re also currently playing with Google Analytics.

So how do other people monitor the buzz they’re creating?

Review of Sun’s weblog publisher

Sometime ago, I took a look at StarOffice, the commercial edition of OpenOffice. An educational version of StarOffice is available as a free (if large) download. I found that while it integrated nicely into my Ubuntu desktop, it didn’t have my personal holy grail, a grammar checker. (I’m still looking for a text editor or word processor with a grammar checker that runs on Ubuntu without emulation/virtualisation).

Presumably as a result of this, I was contacted by the marketing people at Sun, inviting me to try their new blogging plugin. Being the slow learner that I am, I said yes.

Before I knew it, I had a 1.3 MB .oxt file in my inbox and shortly afterwards a CDR in the post. The first thing I did was to install the version on the CDR and check to see whether they’d got a grammar checker (they hadn’t). The second thing I did was click on the little icon inviting me to download the latest patch. The “patch” turns out to be 369.0M and I couldn’t download it anyway, since I don’t believe I’ve got a login to “SunSolve” which is apparently necessary. I’m a little bewildered as to why they shipped me review copy of an out of date base system, particularly since it was a CDR rather than a mass-produced CD.

Installing the “sun weblog publisher” turns out to be an exercise in frustration.

  • All the dialog boxes pop up behind the current windows rather than in front of them
  • The install failed saying it can’t deploy a java launcher
  • Having worked out that it can’t find the version of java installed on the system and that it doesn’t look for $JAVA_HOME like every other piece of Sun software I know of, merely browsing to the appropriate tab in the options dialog seems to make Java work.
  • The extension installs into my home directory, rather than into a system-wide location.

Having set the sun weblog publisher, it seems to do exactly what it says on the tin.

I’m sure I’d find it more exciting (not to mention useful) if I used an office application all day and had content already in office formats that I what to publish, but I don’t. My postings on the OSS Watch blog are mainly spur-of-the-moment reflections and musings, and the shortest distance between my brain and the blog is still via Emacs rather than an office suite. My postings to my photography blog are largely built around posting photos, and I store my photos on flickr, so I post blogs using flickr’s built-in blogging tool.

I’m guessing those people already using OOo, StarOffice or StarSuite and with volumes of content they want publish will find in practical enough, and the integration with the rest of the suite seems pretty smooth (though I’m not an advanced user of the suite, so maybe there are kludges I’m not noticing).

Weblog publisher isn’t open source software, it’s a propriety (10 USD) plug-in that Sun have created that runs in both open source (OpenOffice) and propriety (StarOffice, StarSuite) frameworks across multiple platforms. I think it’s a great idea for a business model and a great way for Sun to leverage the huge install base of OpenOffice. I just can’t see myself using the software day-to-day.

Another review of the software can be found at Linix.com.

Mailing lists and RSS feeds

Recently there was a discussion on the managing community led projects mailing list about whether an RSS feed of the mailing list should be added to the OSS Watch planet. A number of issues came out of this that drove home to me:

  • People have access information in very different ways (don’t assume how readers are going to access your content)
  • People subscribe to an information source for a particular set of reasons, and if source changes and no longer interests them (or they no longer have time to consume the source), they unsubscribe (be consistent)
  • Information sources have different capabilities for reuse and repurposing (don’t assume that content on third party systems is readily repurposable)
  • People like to see information exactly once (don’t SPAM people)

My original suggestion, went down in flames, but I learnt from it.

I’ve just heard that the next upgrade of JISCMail “before Christmas this year” will have better RSS support.

Communities can’t flourish in walled gardens

I recently posted on the dangers of using closed Social Networking sites to develop community. Since writing that post I have entered into numerous discussion about my position. These discussions have taken place in both “walled garden” tools as well as within open communities such as mailing lists, direct emails and the blogsphere. In that time I have listened to, and learnt from, many different views and I’ve started to come to the conclusion that, well, I’m only partially right to when I say:

I predict only one or two of the current Social Networking sites will survive, and they will be the ones that share their network data first.

Randy Metcalfe pointed me at a BBC story in which Michael Geist agrees with my “sharing data” point:

 The better approach – for users and the sites themselves – would be to work towards a world of interoperable social networking.

However, Micheal disagrees, like many others, with my claim that only one or two Social Networking sites will survive:

Some services may believe that it is in their economic interest to stick to a walled garden approach; however, given the global divisions within the social networking world, the mix of language, user preferences, and network effects, it is unlikely that one or two services will capture the global marketplace.

I agree with Michael, and others. There is lots of room out there for niche players. What we need is open standards for creating interoperable networks.

I recently started reviewing open source social networking tools and the standards they adopt. This will be published as an OSS Watch briefing paper sometime in the next couple of months. Please let me know, via your comments, of any social networking/news/bookmark tools you think I should look at.

IBM learns from open source

I’ve always claimed that I have learnt far more from my involvement with open source than through many years of education, employed work, contracting and continuing education. Open source exposes me to people from a much wider domain and skill set than any other aspect of my life.

In a recent interview Bob Sutor, IBM’s Vice President of Open Source and Standards made a similar observation about IBM and their involvement with open source:

It taught us how to better collaborate with others who don’t work for IBM; it demonstrated that business models can evolve; it showed us that a good intellectual property strategy balances both “open” and “closed”; and it taught us that software that grew up in a non-corporate setting can be excellent, wildly successful, and meet customer needs. Linux, along with other open source software and open standards, showed us that being flexible in our thinking and business models is lower risk than adamantly clinging to past practices that might have worked once but now aren’t solely what customers really need.

So, it’s not just individuals like you and I that can learn from open source, it’s also huge organisations like IBM.

Social Networking for Open Source Programmers

Ohloh is an open source network that connects people through the software they create and use. I’ve been monitoring it for some time and have been impressed with the direction it is going. It is gradually turning into a kind of social networking tool for open source programmers. It’s is not like the likes of MySpace and Facebook, relationships and activity is extracted from commit logs of open source projects so there is no need for people to manually maintain their relationships with others.

However, we should recognise that the approach of using version control commit logs is very limited. It does not recognise the contributions of users who report bugs and feature requests, assist in clarifying documentation and perform a great many other useful activities which do not show up in commit logs. It also misses people who do things like contribute to documentation (unless it is stored in version control) . Similarly, those who participate in design discussions on the mailing list are not credited. Finally, it does not recognise activities such as evangelism and community development.

Even when we recognise these limitations, and only focus on programmer activity, there are problems. Raw logs only indicate the number of commits, not the value of those commits. For example, someone running a script to format the code will be seen to have made a major contribution, but in fact they have not added any functional value to the project. Similarly, the user who submits a patch fixing a really sticky bug will not be spotted by OhLoh since the commit log will credit a committer, not the contributor, with the activity, furthermore, OhLoh has no way of knowing it was a complicated bug they squashed.

It is possible for users to indicate their non-commit activity on a project, and a Kudos system allows others to acknowledge the value of their peers contributions. But these details have to be manually maintained.

When considering these limitations, it should be recognised that OhLoh is quite young, but developing fast. Furthermore, it is about to publish an API that will allow other projects to extend its functionality. For example, I’m currently working on a proposal that will allow social networking profiles to be extracted from publicly archived mailing lists. Hopefully their API will enable me to feed this data back into OhLoh via the Simal project.

Even with its current limitations OhLoh is interesting, it may even be useful. I’ll be watching OhLoh with interest.

Ubuntu tries to get community in the US

The Ubuntu project is having a major campaign to get get “LoCos” up and running. These Local Community teams are centred in a geographical location, rather than around a piece of software as most Ubuntu teams are. The Ubuntu vision for LoCos is:

[E]nthusiasts and fans around the world have collect[ing] together in garages, universities and pubs to talk about their interest, learn from each other and help promote their interest.

Which is very similar to the generic Linux User Groups (LUGs) which have been widespread for the last 10-15 years, and before that Unix Users Groups. Ubuntu are co-opting this model, and using the LoCos as a focus for:

  • Distributing free install CDs and other promotional material
  • Install fests, release parties and other gatherings
  • Speaking bookings for Ubuntu speakers
  • Localisation of Ubuntu software, documentation and websites
  • Promotion of Ubuntu locally

Ironically, while Ubuntu appears to have had little trouble getting LoCos set up in the developing world, where the focus on localisation is very strong, they have been struggling to get LoCos up and running in the US. This is probably partly down to the existing entrenched users groups and partly down to the lesser demand for the resources that a LoCo can supply. The current campaign aims to get LoCos running in every state in the US and they’re doing well, with 39 state teams in the start-up phase.

Many of the LoCos seem to have been set up by existing Ubuntu developers in direct competition for developer mindshare with existing local user groups. It will be interesting to see whether this community seeding attempt is successful in the medium to long term.

If you’re interested in building community around your open source project, OSS Watch can help.

The licence doesn’t matter

Many newcomers to open source get caught up in licence discussions. However, when it comes to adoption of open source software in larger organisations and governments “rarely if ever do licensing questions come up.” At least this is a claim recently made by Dave Rosenberg (CEO of MuleSource) in his Q&A with himself, in which he discusses MuleSource’s decision to release under the newly approved CPAL licence.

Dave also says:

Open source is thriving in big companies and governments. I can’t even believe the uptake that is going on.

So, if the licence does not matter to these organisations, why are there so many OSI approved licences? Surely just one Open Source licence would do the trick?

I think the licence does matter, at least for some users. Dave does acknowledge this when discussing MuleSource’s adoption of CPAL as opposed to the GPLv3:

There are several reasons [for choosing CPAL rather than GPLv3]. First of all, we’re not convinced that there is enough clarity about the way our software works (typical deployments have Mule touching 2 or more other applications via many different methods like JMS, web services etc.) to be able to accurately explain how derivative works are created. There are also a host of other wacky Java/integration aspects that are not totally clear. Under no circumstance do we want to stifle adoption of the product or upset the user community.

Matt Assay, an advisor to MuleSource, observes:

I find this fascinating. In some projects, derivative works are fairly straightforward. Not in an ESB (Enterprise Service Bus) like MuleSource. To ensure maximum community contribution, therefore, MuleSource is bending over backward to ensure its customers and community have an easy-to-grok license.

The licence may not be a major consideration for organisations who intend to be users, as opposed to contributors. However, to be sustainable an open source project ought to be encouraging and enabling all users to become contributors.

Different licences not only differentiate between the options available to users and developers, but they also differentiate between the type of community that can be developed around an open source product. MuleSource recognise this and have chosen the licence most appropriate to their product and sustainability plans.

So, what licence are you going to choose? OSS Watch offer free consultations to UK HE and FE projects to help navigate this, and other, open source minefields.

Get more done with community led projects

How do you get people to work on your code without them knowing it?

At OSS Watch we are building an RDF based project catalogue, called Simal, the initial work was based on some work I did, with David Reid, over at Apache for their project catalogue. My contributions to this work built on Apache Forrest, although the Apache projects site eventually opted to use a perl based transformation system so we now only share XSL stylesheets.

As part of the Simal project I have had to make significant improvements to Forrest, of course all this work has been donated back to Forrest and is now available to other Forrest users. But this post is not about the outward benefit, it is about the inward benefit Simal has from engagement with open communities.

Much of this improvement work has been to add plugin support for a new alpha feature in Forrest called the dispatcher. The dispatcher has been around in Forrest for some time and is just now starting to realise its full potential. Simal uses dispatcher in ways it has never been used before, for example, it has Simal now includes features such as an Ajax browser based on Exhibit from MIT and an RSS feed reader using the Google AJAX Feed API.

Because most of these projects (Google AJAX Feed API excepted) are open source I’ve been able to produce a reasonably functional catalogue in very little active developer time, OSS Watch is devoting a mere half a day a week to this project.

But what happens when you hit a bug in another projects code? Simple, fix it and apply a patch. But what if you can’t fix it, what if your knowledge of the project is insufficient?

Well that is exactly what happened recently. I hit a weird bug that prevented my Ajax features from working correctly in certain circumstances. I was lost, I did not know what the problem was, or even where to start looking. After narrowing down the problem as tightly as I could I posted a mail to the Forrest dev list asking for pointers. I tried to describe the problem in as much detail as I could. Around 45 minutes later Thorsten Scherler announced he was going to take a look at it for me.

Here we see the true benefit of an open, community led project. Others are willing to help, often with pointers and ideas, sometimes with actual development effort.

Why did Thorsten want to expend his energy on my problem? I can only guess at his personal reasons, but my first guess would be that because he is the original author of the majority of the dispatcher code and he recognises that I am a power user of his code, he considers me a valuable user. By supporting me, he will ensure that I continue to work with his code, to help identify and iron out bugs and to continue to enhance it when I hit its limits.

Furthermore, and most importantly, by telling me what the problem is and trusting me to review his commits in working on the problem I will learn about how the deep innards of the dispatcher work. This then means that in the future I can assist other users who hit problems with it. More satisfied users means a more sustainable project.
Around one hour later he reported that he had reproduced the problem and identified two potential locations where it may be routed. One was code I am not familiar with, the other was in code I know well. Since this news came at the end of the day I decided not to debug, but to go to bed.

The next day I found another community member had made a suggestion that would help narrow the problem, so now there were three people working on it. I debugged the code I knew and found it was not the source of the problem. Since I was due to go on holiday I notified the community of my findings and went off to enjoy a long sunny weekend in a field listening to music.

When I returned Thorsten had committed what he thought was a fix. Fantastic, people solving my problems while I’m on holiday. People, that is, who are not in my team and are not directly related to Simal.

Unfortunately, the bug was still present, but Thorsten had indicated the area he thought the problem was in his patch. This information enabled me to perform further debugging work. Whilst I didn’t get to the root of the problem I was able to define and document a workaround in the Forrest issue tracker. Now following users can work around the problem until a fix is put in place.

I suspect the information in our discussions and the provision of my workaround will be enough for Thorsten to find the real cause of the problem and so the next release of Forrest will have one bug less.

This is an example of community development at its best. My thanks go out to all the wonderfully talented people working on Simal, even though most of them don’t even know they are contributing. At a rough count there are about fifteen of them, not a bad return on an investment of half a day week.

Depositing documents in repositories: Which repositories should we use?

Recently during a discussion with Pete Cliff from RSP, the question arose “which repository, if any, should OSS Watch be putting our documents in?” The possible answers were:

  1. don’t put them anywhere
  2. put them in an institutional repository
  3. put them in a subject-specific repository
  4. put them in a funder-specific repository
  5. put them in a creative commons repository
  6. some combination of the above

I’ll confess straight-off that I’m enough of a bibliophile and library-lover that first option doesn’t appeal, besides our entire remit at OSS Watch is dissemination based, and it makes no sense to hide our outputs under a bushel. Repositories are to open access what version control systems are to open source; it’s very hard to argue against their use except where copyright is unclear, which is generally a sign of larger problems.

Institutional repositories are something I’m less than confident about.

Recently I completed my PhD in Computer Science at the University of Waikato, and ingest of my thesis into the institutionally supported “Australasian Digital Theses Program” repository was a seamless part of the submission and degree granting process. It just worked, the only slight wrinkle was that I wrote in LaTeX rather than Word as the documented submission process assumed. I have confidence that in a hundred years a copy of my thesis will still be in existence (I have less confidence it will be of interest, but never mind).

Once I was awarded my degree I also deposited the thesis with my employers institutional repository (Oxford Eprints), primarily to increase the chance that future Research Assessment Exercise-type activities would also “just work.” Several months after depositing with Oxford Eprints, I got an email from OULS, saying that they were migrating from Oxford Eprints to Oxford University Research Archive, and that my deposit did not meet the collections policy for the new archive, so it would be dropped. Presumably the new archive either isn’t going to be used for RAE-type work, or PhD theses are not considered research. Dropping works from an archive seems like a crazy policy to me, since I always understood the difference between a library and an archive was that an archive didn’t drop deposits after ingest.

Thus I have very mixed feelings about institutional repositories.

Subject-specific repositories can also be very effective. arXiv.org is the canonical example here. It’s a huge repository of peer-review papers in physical and mathematical sciences built to overcome the problem that in hot topics research is moving faster than journals can be published on paper. arXiv.org has been around for more than ten years and remains well-supported, well-trusted and well-used. The problem I have with subject-specific repositories is that they are insular, they further entrench the two cultures and act as barriers to communication.

I strongly believe that the currently well-funded physical sciences would benefit from a little cross subsidising of the infrastructure of currently less well-funded fields. Look at Middle-Eastern studies in western universities before and after 11 September 2001. Literally overnight (at least for those of us down under at the time), this was a field became of huge popular, political and academic interest. Shared infrastructure enables hot topics to scale up rapidly. Let’s not forget, either, that during the western European dark ages (when Galileo Galilei was on the Index Librorum Prohibitorum) it was the Middle-East that kept alive the foundations of those currently well-funded physical sciences, to the extent that the field of Algebra is named after the work of Muhammad ibn M?s? al-Khw?rizm? who worked in Baghdad, and the text of “On Divisions of Figures” by Euclid is known only by re-translation from the Arabic.

There has been recent discussion of JISC setting up a repository for outputs of work they fund. This would allow them to ensure that materials they fund can be found and promoted as appropriate. It would also greatly enable enforcement of the rules about making outputs available to the UK higher and further education. For the fundees, a repository would enable them to quickly and easily provide evidence of the project outputs. In the medium to long-term I have my doubts about the such a repository, however. While I have no doubt that Oxford University and the field of physics will be around in 100 years, I’m not confident that the JISC will be. Unless it’s clear where the repository will be in 100 years, I’m a little hesitant. Having it hosted by research institution which is paid up-front to preserve them in perpetuity would be adequate. The JISC already funds Jorum, a repository of learning content, but my understanding is that unlike research outputs, learning content is not expected to be archived in perpetuity: it’s a library rather than an archive.

There are a number of archives using creative commons licences (and one or two others) as a political message. These have a particular resonance with the open archives, open source and open content movements, in that actively work to promote content reuse, thoughtful use of copyright and build communities around shared content. While I have sympathies with these political messages, the archives are typically of uncertain long-term sustainability. In a hundred years the political message is unlikely to be relevant, and while some may adapt and evolve, many are likely to wither and vanish. If we’re lucky some will get absorbed into institutions which already have long-term sustainability.

Placing a document in multiple repositories has benefits. You potentially get the advantages of each of the repositories at the cost of redepositing the document. By creating two sets of metadata describing a document during deposit, there is the potential to later use that metadata for cross-walking the repositories (this only works, of course, if the document has a unique id across the repositories). But by diluting the unique holdings of a repository, you make it a less attractive target for preservation funding or to be absorbed into larger repositories. Automated ingest from one repository to another or deposit into two repositories with the same metadata schema doesn’t work either, since really what you’re building is a mapping from one metadata schema to another based on instances which have been human-classified.

There are a number of technical and consortia-based approaches which solve many of these problems. The US, UK or EU science funding bodies may decided that explicit direct funding for repositories with scope to match that of research libraries is beneficial in the short to medium term. Automated OAI-PMH techniques may evolve to the point where researchers expect full local mirrors of all significant archives globally (consider the UK mirror of arXiv.org). The large science publishers may release that the on-coming light is not the end of the tunnel. Copyright law may get completely rewritten. But none of these are directly relevant to the question of which repository should we be putting our documents in right now.

Ideas anyone?