Releasing a new Open Source Project – BitTorrent Sync Indicator

BitTorrent, creators of the highly popular distributed peer-to-peer file sharing protocol, recently released BitTorrent Sync, a solution for syncing folders between machines based on the BitTorrent protocol.  BTSync provides a fully distributed and encrypted alternative to services like Dropbox where all your data is synced through a third-party server.

BTSync has been released for Windows, Mac, Linux and other platforms, although the user experience on Linux isn’t quite as polished as it’s counterparts – the only interface provided is via a local webserver accessed through your browser, while Windows and Mac get a nice desktop GUI with a system tray indicator.  I found this a pain as I’d sometimes finish making changes to a synced file and want to shut my computer down quickly, but had to open my browser first to check if the file had finished syncing.

While BTSync isn’t Open Source, the developers are very open to feedback from users and developers.  I quickly realised that I’d be able to use data from the web interface to create a desktop indicator for Linux, so in the open source tradition of scratching my own itch, I wrote a python script that gave me an indicator to show if a file was syncing.  When it was workable, I stuck it on Github with an open source licence and made a post on the BitTorrent Labs forum.

I then noticed another post on the forum by a developer called Leo Moll – he was packaging BitTorrent Sync for Ubuntu and Debian distributions, and as I’d written my script with Ubuntu in mind, asked if he’d like to include it in his packages.  He agreed and before long my indicator could be installed alongside a well integrated BitTorrent Sync client.

Here’s when things really took off.  With it being so easy to get hold of my indicator, people started using it and reporting bugs on the GitHub page.  Almost as quickly, they started submitting patches.  I got a new set of better animated icons for the indicator, various bugfixes for cases I’d not come across, new feature requests, and even someone packaging the indicator for Arch Linux.

Alongside this Leo and I were contacted by another developer who was packaging BitTorrent Sync for Debian and Ubuntu.  We had a discussion and worked out where best to focus our efforts to avoid duplicating each other’s work and creating conflicting packages.  Leo and I are now discussing merging our codebases to streamline our work and allow for better integration.

In the space of a month, what started as a little hack to make my life a little bit easier has become a vibrant project with an engaged community of developers and users.  The real key, I think, has been to make it as simple as possible to let users run the software, and to show I’m listening and responsive to feedback.

How to write better commit messages

Lolcat

Commit messages are an important part of how software is developed, debugged and maintained, and when done badly can become an unnecessary barrier to collaboration in open source projects.

Bad commit messages make it harder to figure out where problems have been introduced, especially for newcomers to a project.

The worst-case scenario for anyone trying to make sense of changes to a project is a commit message that offers basically no information for a major change affecting multiple locations in the code.

"Message: Fixed It. Files: 1366 modified"
“Worst commit message ever” via Jeff Dallien

 

To get a good sense of how commit messages are useful, take a project and look at its history in the revision system. You’ll see something like this:

  • Revision 1525597: Add ap_errorlog_provider to make ErrorLog logging modular. Move syslog support from core to new mod_syslog.
  • Revision 1514267: tweak syntax strings for ServerTokens
  • Revision 1514255: follow-up to r813376: finish reverting r808965 (ServerTokens set foo)
  • Revision 1506474: server/core.c (set_document_root): Improve error message for inaccessible docroot.
  • Revision 1498880: Replace pre_htaccess hook with more flexible open_htaccess hook

Or, if you’re unlucky, you might see something like this:

  • Revision 1525597: fixed it
  • Revision 1514267: more changes
  • Revision 1514255: bug fixes
  • Revision 1506474: more improvements
  • Revision 1498880: lots of changes

If you now imagine you’re looking to find out where, say, the ServerTokens syntax changed, you can see the value of providing good commit messages.

So, how can you write better commit messages? Below are some top tips.

Be brief

Commit messages should be brief and easy to scan. Often the reader of commit messages is viewing then in a log or revision history, so make sure the most important words and phrases stand out.

There is no hard rule about this. Some developers prefer an approach of having a very short one-line message but with optional subsequent paragraphs of context and description, whereas others prefer to only provide one line of any length, and link to detailed explanations elsewhere, such as in the issue tracker.

However, you should use your common sense as to how much information should be in the commit message. If you find you’re writing lots of explanatory text, maybe you need to put more comments in the code itself where the changes are made, or add more detail to an issue in the tracker.

Make messages easier to find when searching

As well as scanning the revision history, developers also search logs using grep or similar tools. In which case its important to use the best terms for discovery. For example, if you use component or module names, make sure you spell them correctly and use them consistently. For example, if its a component called “DownloadManager” don’t use “Download Manager”.

Commit messages can also turn up in search engines, either project-specific searches or in regular web search engines. So its important to be clear and consistent in language use.

Provide sufficient context

While brevity is  desirable, commit messages need sufficient context to be useful.

For a one-line fix, you can always view the diff to see what changed, but if a commit affects multiple files or multiple lines of code, it needs more explanation so that other developers and users can re-establish the context of the commit.

Peter Hutterer suggests a commit message needs to answer three questions:

  • Why is it necessary? It may fix a bug, it may add a feature, it may improve performance, reliabilty, stability, or just be a change for the sake of correctness.
  • How does it address the issue? For short obvious patches this part can be omitted, but it should be a high level description of what the approach was.
  • What effects does the patch have? (In addition to the obvious ones, this may include benchmarks, side effects, etc.)

You don’t need to go into a lot of depth, but you need to capture enough of what is going on that someone reading the revision history can get a sense of what your commit did without having to look at all the diffs.

Added unicode support for imported files to prevent encoding errors in article.title

This doesn’t necessarily need to be in the message itself – for example, if there has been a  discussion on the mailing list, or there is plenty of information in a related issue on the project tracker, then you can include a reference or link to this in the commit message.

Added unicode support for imported files to prevent encoding errors in article.title (see bug #1345)

Some issue trackers can also link commits to an issue automatically based on the commit message, in which case you need to make sure you’re using the correct format for it to pick this up.

Provide credit and recognition where it is due

While you may be committing the changes, you may not in fact be the author – if you’re applying someone else’s changes, you need to acknowledge the fact and give the author recognition. Even if its not a complete submitted patch, but just a “if you change x to y that would fix the bug”, its worth putting in an acknowledgement.

Added unicode support for imported files to prevent encoding errors in article.title (see bug #1345). Thanks to Jane Doe for the patch

This has both a social function (placing credit where it is due) and also provides an audit trail.

(Some projects prefer a more formal “Submitted by: <username>” but I like to just say “thanks to <username>”.)

 Avoid repetition

Troy Hunt provides another rule of thumb for commit messages:  subsequent commit messages from the same author should never be identical.

This is partly because it makes it more difficult to distinguish changes in the version history, and partly because each change should, logically, be different to the last.

Try not to swear or insult anyone

Fixed stupid $$&!! mistake caused by £$%$%@ Steve

OK, it is difficult sometimes, but lets keep things professional. Save your venting for the IRC channel :)

More seriously, commit messages form part of the overall tone of communications for a project; snarky, rude and unhelpful commit messages don’t put your community in a good light, particularly for newcomers.

Check the logs to see how you’re doing

Every now and again its worth checking your log or revision history for your project, and reviewing the last page or so of commit messages. Would somehow relatively new to the project get a good idea of what was happening? Can you improve the usefulness of the messages that you and your community members are writing?

Follow project guidelines

Your project might have a preferred fromat for commit messages, so make sure you find out before making a commit.

For example, Moodle’s commit message guidelines call for a message subject line consisting of the issue number followed by component name, and the rest of the subject up to 72 characters.

Any more?

I’d love to hear any more suggestions for better commit messages (or your worst examples of bad practice!)

For a random commit message,  give WhatTheCommit a whirl

Photo credit: Wilson Afonso 

OSS Watch Community Workshop for TYPO3

In November, Mark and I will be in Mannheim, Germany as part of the TYPO3 Marketing Sprint Week, where we’ll be facilitating an OSS Watch workshop focussed on communications in open source communities.

TYPO3 logo

Effective communication in all its aspects is crucial for a healthy open source community, and we’re excited to be able to pull all of these aspects together into a two-day workshop.

You can find out more about the TYPO3 Marketing Sprint Week on their website.

If you’re interested in organising a similar activity for your project or organisation, get in touch with us.

BBC creates HTML5 TV appliance

The BBC R&D labs have recently been busy working on a TV prototyping appliance called the egBox. The idea behind it is to create a minimum viable product of a HTML5-based TV as the base platform for experimenting with new features.

The appliance uses HTML 5, the WebM video codec, runs services using Node.js, and is being used as the basis for various technology experiments at the BBC. The  set of components used by the egBox is one that many developers will be familiar with – node, redis, socket.io, backbone.js – on top of which the developers are working on areas such as TV authentication.

What is interesting is that, while the idea is to create a minimum viable TV product as the basis of other prototypes, the nature of the egBox stack suggests lots of interesting ways to extend it. For example, as it uses Node.js, you might make use of the Webinos platform to offer apps.

At the time of writing the egBox code is not open source; however, that is the eventual plan.

The usual sticking point for any foray into TV using open source is actually working with TV services themselves. The “Smart TV” space not only has a whole raft of competing and conflicting standards, but most of the consortia operate on a paid membership basis, some even requiring signing NDAs to read the documentation; this is something I covered in a post a few years ago.

Things have improved since then, but there is still a long way to go. Ironically, a W3C standard for encrypted media content might actually be a good thing – or at least, a less-bad-thing – for open source TV, as W3C standards are royalty-free and publicly available unlike many of the specifications developed within the TV industry.

The upshot is that any open source project looking to work with a good range of TV services is likely to have to pony up some membership fees, and potentially keep some important parts of the codebase closed source to avoid issues with partners and consortia.

Still, its going to be interesting to see whats possible with the egBox.

You can find out more on the egBox over at the BBC’s R&D blog.

How “open source” is the Minnowboard?

This week, Intel announced the Minnowboard, a small embedded development board akin to the RaspberryPi, BeagleBoard and similar devices. The point that grabbed my attention is that it’s being touted as an “open source computer”. The device is shipped running Ångström and is compatible with the Yocto project for building custom embedded Linux systems, but while there are many devices available that run Linux, the term “open source computer” is seldom bandied about. So just how “open source” is the Minnowboard?

The minnowboard viewed from above, with labelled external interfaces

The Minnowboard from above

For a start, the board uses Intel chips, which is usually a good sign that the drivers required will be open source, without requiring any binary blobs in the Linux kernel. Furthermore, the UEFI system is open source. This is the code which executes when the computer first powers up and launches the operating system’s boot loader, and making this open source allows hackers to write their own pre-OS applications and utilities for the Intel platform, an opportunity we don’t often see on consumer devices.

Update: Scott Garman’s comment below clarifies the situation regarding graphics drivers and initialisation code.  A proprietary driver is required for hardware accelerated graphics.

However, moving away from the software, there’s a clear message that the Minnowboard is meant to be “open source hardware”. There are of course competing definitions from groups like OHANDA and OSHWA as to what qualifies as “open hardware” and “open source hardware” – one we heard at Open Source Junction included that all components should be available from multiple sources which is never going to be the case here – but a reasonable metric in this case would be, “is one free to modify the design and make a similar device?”.

The language on the site certainly seems to suggest that this is the intention. The Minnowboard Design Goals page clearly states:

Our goal was to create a customizable developer board that uses Intel® Architecture and can be easily replicated. It is a simple board layout that lends itself to customization. The hardware design is open. We used open source software as much as possible. We used standard (not cutting edge) components that are in stock and affordable, to keep the cost down.

Also, the introductory video explicitly says that the technical documentation will be available under Creative Commons licenses allowing people to modify the designs without signing an NDA.
That said, this documentation isn’t currently on the website, the only reference is a notice saying that “By August 2013 we will post all board documentation, software releases, links to Google Group forums, where to buy, and information of interest to the community.” We’ll just have to be patient.

Update: The schematics, design files and bill of materials are now available on the Technical Features page of the Minnowboard website.

Minnowboard with a type-C lure expansion template attached.

There are 3 formats for “lure” expansion boards – the Type C template here shows the smallest.

Another vector to openness for the Minnowboard is the opportunity to create daughter boards dubbed lures. These are akin to Arudino shields and allow additional components to be plugged in to the main board to expand its capabilities. There’s already several designs taking shape, and there’s certainly the potential for a community to arise around creation of these lures.

What Intel have produced is an open platform with standard components and interfaces for prototyping and developing embedded systems. Unlike the RaspberryPi which is designed as a learning device, the Minnowboard’s design (once released) could represent a starting point for both hobbyist projects and commercial products, without any royalties to be paid to its original designers (except, of course, that you’ll need to buy your chips from Intel). From Intel’s point of view, this is clearly a move to gain traction in the ARM-dominated market of embedded systems. As far as calling it an “open source computer” goes, once the designs are published, I think they’ll have done a pretty good job to justify the term.

Images from Minnowboard Wiki users Jayneil and Prpplague, used under CC-BY-SA.

Unlicensed code: is it ever OK?

Car With No License Plate

In an earlier post Mark Johnson responded to recent commentary about unlicensed code on Github. Mark criticised the idea put forward by some pundits that developers not licensing their software project was some kind of movement. Instead Mark sees it as emerging from a lack of education (or, quite possibly, sheer laziness). He also reiterated the point that a lack of licensing clarity discourages community and harms reuse and sustainability of software. Experienced developers won’t touch unlicensed code because they have no legal right to use it.

However, I decided to follow up by seeing if I could start from the other end of the argument and identify some good – or at least acceptable – cases for where you might legitimately make your source code available intentionally without applying a license.

Here’s what I’ve come up with.

Deferring the licensing decision

Licenses interact with your choice of business model. For example, some licenses are more useful than others when pursuing a dual-licensing strategy; some make more sense for software that provides online services; and each license provides some degree of advantage over others for particular cases (if there wasn’t, there wouldn’t be so many of them!)

However, for some projects its hard to identify early on what the business case is going to be, or even if there is likely to be any point developing one.

Your software experiment may turn into a liberally-licensed library, a copyleft and commercial dual-licensed application, or a service offered under something like the CPAL or AGPL, but maybe its too early to tell. Should you keep it under wraps while you work out where its going, or share it now and risk selecting the wrong license?

Releasing your code with no license while you are still deciding on an appropriate model is one possible option. The downside of this is that no-one will really be able to reuse your code until you do apply a license, and it is also likely to deter potential collaborators.

So even here I’d still recommend choosing a license and revisiting the choice later as the project matures: as the owner of the intellectual property for your software you always have the option of changing your mind, and your license, later on.

Changing licenses for software can be controversial and difficult, but at least you have more chance of developing a user community and partner network to have this argument with by making the initial code available under a recognised license.

Note that its also much easier to change from a more restrictive license to a more permissive one than the other way around.

Software as evidence

There is a type of project where releasing code but not licensing it (effectively sharing code with all rights reserved) may make sense. This is where you have no interest in anyone else actually reusing your code, or building on it, or contributing to it!

Why on earth would you want to do that? Well, when the purpose of releasing the code is not to create viable software, but instead to provide transparency and reproducibility.

For example, if you have written software as part of an experiment, and you need reviewers to be able to replicate or inspect your work. In this case, there is no real expectation that anyone will take your code and reuse it for something else, or integrate it into any kind of distribution.

So maybe then you can just distribute the code, but not as open source or free software?

One reason why that may not be a great idea is that no-one else being able to make use of the code is just your perspective; from another perspective, maybe your code has a value you don’t realise? As Scott Hanselman points out, you can think of this kind of these projects as a “Garage Sale” where one developer’s junk is another’s treasure.

You may also be concerned that, by distributing your code under an open source license you may be raising expectations of what the code is for, or inviting a critique of your software development skills; this is a theme that Neil Chue Hong picks up on in a post over on the SSI blog. (Neil even points to a special license, the CRAPL, aimed at this sort of case.)

Even for very specialised academic code aimed at a single objective for a single paper in a specialist journal, the case can be made for releasing the code as Free or Open Source software.

(For another good discussion of this topic, see Randall LeVeque’s post Top ten reasons to not share your code (and why you should anyway)

The Contractual Obligation Software Project

Sometimes you get to work on a project as part of some sort of funded initiative, which, while not stipulating sharing your code as open source, does expect you to at least make the code “available” in some fashion.

So, like an artist locked into a record contract, when the funding runs out you may be tempted to just make a code dump somewhere in order to meet your obligations, and in a fit of spite not even bother put a license on it either.

However, the “garage sale” metaphor works well here, too. Maybe the project or initiative didn’t exactly set the world on fire, but maybe some of the code written in the process could still be salvaged for something.

Gists and examples

You often find code snippets in blog posts or as solutions to questions on StackOverflow. This is very rarely explicitly licensed, but the assumption is that its usually OK to copy and paste without worrying too much about licensing. If you’re conscientious, you can always pop in a comment with a link to where you found it.

However, there are also grey areas, such as Gists, which are a bit more than a few lines of code, but not quite project in their own right.

Even with a small snippet of code, its not always clear whether or not copyright protection applies. For example, a lengthy example of how a standard Java library should be used would probably not be protected as it doesn’t involve much creativity. However, a two-line program that offers a novel solution to a problem could well be considered protected under copyright.

So, in some cases you may be justified in not bothering with a license for a snippet or Gist, but to avoid all uncertainty its still better to put in a license header, or at least make it clear you’re willing to license the code for anyone who wants it that thinks its necessary.

Creating Fear, Uncertainty and Doubt

OK, I wouldn’t say this is a great reason, but it could be a reason.

Maybe you really do want to make people uncertain about whether they can use your code because … well, because thats the way you roll.

Maybe you’re happy to license your code, but only with people who ask you nicely first, and you don’t want them to be able to distribute their code as free or open source software for some reason.

Or maybe you are looking to bait the uncautious into copying your software so you can threaten them with lawyers and shake them down for money, because you are actually a Copyright Troll.

A copyright troll

Not necessarily wise, but not necessarily evil either

From this brief excursion I would conclude that distributing unlicensed code is never a great idea, and rarely even a good one, but I can see there are circumstances where you might consider doing it. In each case, though, there is usually a better option worth taking.

Car image by Su-May . Copyright Troll Image by redtimmy

Unlicensed code: Movement or Madness?

One of the hot topics of commentary on open source development at the moment is the licensing situation on GitHub.  When code is committed to GitHub, the copyright owner (usually the author or their employer) retains all rights to the code, and anyone wishing to re-use the code (by downloading it, or by “forking” and modifying it) is bound by the terms of the license the code is published under.  The point of discussion in this case, is that many (indeed, the majority) of repositories on GitHub contain no license file at all.

There are two troubling points to the commentary on this phenomenon.  The first is that some discussions suggest that publishing with no license is “highly permissive”, implicitly allowing anyone to take the code and do with it as they wish.

In fact, it’s usually the case that having no license on your code is equivalent to having an “All Rights Reserved” notice, preventing any re-use of your code at all.  Whether it’s the copyright holder’s intention to enforce these rights isn’t being made clear, but it’ll be enough to put off any company who might want to engage with such a project under an open development model.

The second troubling point is that commentators are time and again dressing this up as a wilful movement.  James Governor coined the term “Post Open Source Software“, while Matt Asay claims “Open Source Is Old School, Says The GitHub Generation“.  These commentaries seem to imply that there’s some sort of “No License Manifesto” being championed (in a similar fashion to the Agile Manifesto, perhaps).

The only movement I’ve seen which would be akin to this is the Unlicense, which encourages authors to wilfully lay aside any claims to their rights, effectively a Public Domain dedication which Glyn Moody has suggested is the way forward for open source.

However, what we’ve seen on GitHub shows no such conscious setting aside of rights, it shows a lack of education.  Publishing articles touting release without a license as how all the cool new kids are working encourages behaviour which could prove damaging to the development of a project’s community, and the wider community in turn.

Fortunately there are voices of reason in these discussions.  Stephen Walli of the Outercurve Foundation points out that governance == community.  If a project seeks to “fuck the license and governance” as James Governor suggests, then they risk doing the same to their community by alienating contributors (particularly those that are part of a larger organisation, rather than individual developers), as these contributors have no predictable structure to work within.

If the project lead might turn around and say “I dont feel like accepting your contributions, and by the way, if you keep using my code I’ll sue you”, you’ve got very little incentive to work with them.

By neglecting your community in this way, you project is at risk of being limited to a few individual contributors who know and trust one another implicitly.  I can’t believe that developers seeking to allow permissive use of their code would be happy with this as an outcome.

GitHub haven’t yet made any suggestion that they feel this is a problem they should work to solve.  It’s our responsibility as a community to ensure that we educate newcomers to become responsible open source citizens, rather than encouraging them to follow established bad practices.

Licensing and governance analysis form 2 cornerstones of OSS Watch’s openness rating.  If you’d like advice on how to improve your projects management of these areas, please get in touch.

Wookie: A case study in sustainability

At OSS Watch we periodically review all the resources on our main website to make sure they’re accurate and up to date. Last week it was time to revise our case study on Apache Wookie, which is a project I’ve been involved with for some time.

Wookie graffiti

OSS Watch became involved with Wookie while I was working in an EU project based at the University of Bolton. The project as a whole had done lots of interesting stuff, but as with many large projects the whole was somewhat less than the sum of its parts; the central joined-up platform wasn’t really going to take off after the project finished. However, in the process we had built quite a promising system for adding functionality to the core portal shell using the W3C Widgets specification.

Towards the end of the project I went to an OSS Watch event, and spoke with Ross Gardler about what we were doing. Ross explained the Apache Incubator model to me, and from there on I was hooked.

Fast forward to 2013, and Apache Wookie is out of the incubator and a top-level Apache project, and is now on its seventh official release (the last one was in April). Its not a huge project – the team is still small, though its far more diverse than when we started out.

The tempo of development has also slowed in recent years. However, in part thats due to the maturing of the software to a point where code churn for its own sake has a negative impact on the projects that depend on it. Most recent updates have been fixing bugs affecting deployment in various unusual configurations, driven largely from reports by users. So this isn’t necessarily a bad thing!

Something that has also had a very positive impact on the project is having a very active downstream project – Apache Rave. This has driven a lot of improvements to Wookie to improve integration and deployment.

Two major EU projects have been working with Wookie and Rave over the past two years, and are coming towards their end – one this year, and the other in 2014.

Unlike previous projects they have focussed on working with existing software projects rather than going it alone, and have contributed code, user studies and content.  This has been a great experience, and hopefully future projects can learn from this approach.

Wookie stands as an example of how OSS Watch can help take work from within the HE sector and turn it into a sustainable open source software project; and as a beneficiary of this approach I’m keen to offer the same help I received to others.

Do you think your University-based project has the potential to go further? If so, get in touch!

Read the updated case study on Apache Wookie at OSS Watch.

(Photo by Silus Grok, used under CC-BY-SA license)

4 Tips for Keeping on Top of Project Dependencies

Almost any software project involves working with dependencies – from single-purpose libraries to complete frameworks. When you’re working on a project it’s tempting to bring in libraries, focus on meeting the user need, and figure out the niceties later. However, a little thought early on can go a long way.

Photo of a stack of cards

This is because every dependency can bring its own licensing obligations that affect how you are able to distribute your own software. In some cases, in order to release the software under a particular license you may end up having to rewrite substantial amounts of software to remove reliance on a library or framework that is distributed under an incompatible license.

So there is a tradeoff between being agile and productive in the short term, against the risk of needing to do a costly refactoring triggered by a compatibility check before – or even worse, after – a release.

For larger projects, and organisations with multiple projects, this starts to stray into the territory of open source policies and compliance processes, but for this post lets just focus on the basics for small projects.

1. Make it routine

A good strategy is to build good dependency management practices into your general software development practices – similar to the concept of building in quality or building in security.

In other words, given that the cost of fixing things later can be significant, it’s worth investing in the practices and tools that can ensure potential issues are spotted and fixed earlier.

At its simplest, this can just mean developing a greater awareness as an individual developer of where your code comes from,  knowing that what you reuse can limit your choices for how you license and distribute your own code.

So in practical terms, this means being careful about copying and pasting code from the web, and making sure you know the licenses of any dependencies, preferably before working with them, but certainly before building any reliance on them into your code.

It may also make sense to handle any required attribution notices for inclusion in a NOTICE and README as you go along, rather than just rely on a release audit to always pick them up.

2. Let tools take some of the strain

There are also tools that can help make things easier. For example, if you use Maven for Java projects, there is a License Validator plugin that can help flag up problems as part of your compile and build process.

Alternatively, Ninka is an Open Source tool for scanning files for licenses and copyrights. While it can’t follow import declarations or dynamically linked libraries, it can be useful to periodically check builds. A similar project is Apache RAT (Release Audit Tool) which was originally created for use within the Apache Software Foundation for reviewing releases made in the Apache Incubator.

For larger projects and organisations there are also complete open source policy compliance solutions like Protek from Black Duck, or Discovery from OpenLogic.

It’s also worth pointing out that, while tools can be a part of the solution – and can be invaluable for large projects – ultimately it’s still your responsibility to make sure you meet the obligations of the software you are reusing.

3. Remember to check more than just the licences!

If a dependency has a compatible licence, thats great. But what about if the project that distributes it doesn’t bother checking their own dependencies?

This is where it’s good to have an idea about the governance and processes of projects you depend on.

There aren’t just licensing risks associated with dependencies – if you rely heavily on a library that has only one or two developers then you also run the risk that it may become a “zombie” project with implications for the rest of your code, for example, if security patches are no longer being applied.

A zombie

Beware of zombie projects!

The commercial tools mentioned above are also typically backed by a knowledge base that can also flag up other issues with dependencies, such as governance or sustainability problems.  However, just having a check for the project on Ohloh is often good enough for most smaller projects to check that a library is still “live”.

If you need to know more about the sustainability of a particular project, OSS Watch can carry out an Openness Review to check its viability using a range of factors – get in touch with us if you want to know more.

4. Keep track of past decisions and share knowledge with colleagues

Some organisations make use of component registries to keep track of which components they approve on in their software projects. This can save time spent by developers researching the same libraries, but makes most sense when you have a lot of projects that probably need the same kinds of components, in which case focussing on reusing the same set of libraries makes sense.

Another reason for using a registry is where you need to perform more detailed evaluations, for example for security, and so checking a dependency is more involved than just figuring out which license it uses, and that the project isn’t dead.

Some examples of commercial registries are Sonatype Component Lifecycle Management   and Black Duck Code Center. Again, for a smaller project or an organisation with a relatively small set of projects this can be overkill, and just having a shared document somewhere where you can keep note of which libraries you’ve used can be effective.

For example, you could share a spreadsheet with colleagues containing some basic information on each library like what version you’re using, what license it’s under and the date and results of any investigations you’ve done into sustainability, security or risk assessment.

Is it worth it?

Reusing code is good practice and should save you time and expense – so it’s annoying if the administration associated with it starts affecting your productivity.

You can make a judgement call about what level of risk you feel is acceptable; for example, on an internal-only research project the risk of having to undergo a major refactoring should the project be successful may be one worth taking.

However, for a production system, or a component that is itself intended for reuse, you may just have to accept that you have to be a bit more diligent in how you reuse code.

Photo by DieselDemon used under CC-BY-2.0.

Rebooting Open Source Projects

Restart ButtonOne of the most annoying things that can happen to software is for it to become “abandonware”, particularly if its something you rely on.

This is where the original developers of the software cease further development or support, usually to move onto other projects, or sometimes due to a change of employment or other personal circumstances. Whatever the reason for it, the software that was once a key part of your infrastructure ceases to be updated, and over time its going to be less and less viable.

However, with Open Source there is sometimes the potential to “reboot” a software project even when the original developers have moved onto other things.

This recently happened to me with a library called HtmlCleaner. HtmlCleaner is a handy little library that, as the name suggests, cleans up “dirty” HTML and allows you to modify it programmatically. There are other libraries for doing this of course, but HtmlCleaner always had the edge for me due to its light footprint, small number of dependencies, and an API that provided the functions I needed. HtmlCleaner is used in Apache Wookie, for example, to clean up HTML in web widgets and inject additional JavaScript needed for extra features.

Unfortunately, the developer of HtmlCleaner, Vladimir Nikic, had moved onto working on other things, and the last release of the library was in 2010. Then recently we came across a bug in HtmlCleaner that was affecting Apache Wookie.

After some Googling I found that others had come across the same problem, and had forked the project to apply their own fixes. Which didn’t really solve the problem, as you then have to ask yourself which of the forks are viable and up to date, and also run the risk of fragmenting the community.  Wouldn’t it be better if we could get those interested enough to make a fork, to take an active role in the core project?

HtmlCleaner already has a large user community, and the project regularly receives new patches and feature requests; all that was missing at a minimum was someone available to thank users for their contributions, review and apply patches, and manage the releases.

I got in touch with Vladimir, and Patrick Moore, creator of one of the forks, and we agreed to “reboot” the HtmlCleaner project, with Patrick and I getting maintainer access.

I applied a user-contributed patch to fix the bug that was vexing Apache Wookie users, and released HtmlCleaner 2.2.1. Patrick is merging back in the various fixes and improvements that he’d made to their fork, to be released as 2.4. After a gap of two years, HtmlCleaner is back in business!

Of course, in another year or two perhaps both Patrick and I will have moved onto other things too. But if we do, hopefully like Vladimir we’ll be able to hand over to new developers coming from within the community.

Some open source foundations and umbrella projects list dormant projects that can be rebooted;  for example, the Apache Attic is where Apache projects head when they are no longer active, and there are processes for allowing a new project team to restart activity, either by moving the project back into the incubator, creating a fork, or forming a new project management committee.

If projects have been dormant for too long, technologies and frameworks may have moved on so far that reviving them is not a viable option, but in some cases there can be mileage in reviving  one of these projects rather than starting from scratch, particularly if there is still a user community to draw upon.

 (image by _bianconero licensed under CC-BY)