About Scott Wilson

Scott Wilson is the Service Manager at OSS Watch

Wookie: A case study in sustainability

At OSS Watch we periodically review all the resources on our main website to make sure they’re accurate and up to date. Last week it was time to revise our case study on Apache Wookie, which is a project I’ve been involved with for some time.

Wookie graffiti

OSS Watch became involved with Wookie while I was working in an EU project based at the University of Bolton. The project as a whole had done lots of interesting stuff, but as with many large projects the whole was somewhat less than the sum of its parts; the central joined-up platform wasn’t really going to take off after the project finished. However, in the process we had built quite a promising system for adding functionality to the core portal shell using the W3C Widgets specification.

Towards the end of the project I went to an OSS Watch event, and spoke with Ross Gardler about what we were doing. Ross explained the Apache Incubator model to me, and from there on I was hooked.

Fast forward to 2013, and Apache Wookie is out of the incubator and a top-level Apache project, and is now on its seventh official release (the last one was in April). Its not a huge project – the team is still small, though its far more diverse than when we started out.

The tempo of development has also slowed in recent years. However, in part thats due to the maturing of the software to a point where code churn for its own sake has a negative impact on the projects that depend on it. Most recent updates have been fixing bugs affecting deployment in various unusual configurations, driven largely from reports by users. So this isn’t necessarily a bad thing!

Something that has also had a very positive impact on the project is having a very active downstream project – Apache Rave. This has driven a lot of improvements to Wookie to improve integration and deployment.

Two major EU projects have been working with Wookie and Rave over the past two years, and are coming towards their end – one this year, and the other in 2014.

Unlike previous projects they have focussed on working with existing software projects rather than going it alone, and have contributed code, user studies and content.  This has been a great experience, and hopefully future projects can learn from this approach.

Wookie stands as an example of how OSS Watch can help take work from within the HE sector and turn it into a sustainable open source software project; and as a beneficiary of this approach I’m keen to offer the same help I received to others.

Do you think your University-based project has the potential to go further? If so, get in touch!

Read the updated case study on Apache Wookie at OSS Watch.

(Photo by Silus Grok, used under CC-BY-SA license)

4 Tips for Keeping on Top of Project Dependencies

Almost any software project involves working with dependencies – from single-purpose libraries to complete frameworks. When you’re working on a project it’s tempting to bring in libraries, focus on meeting the user need, and figure out the niceties later. However, a little thought early on can go a long way.

Photo of a stack of cards

This is because every dependency can bring its own licensing obligations that affect how you are able to distribute your own software. In some cases, in order to release the software under a particular license you may end up having to rewrite substantial amounts of software to remove reliance on a library or framework that is distributed under an incompatible license.

So there is a tradeoff between being agile and productive in the short term, against the risk of needing to do a costly refactoring triggered by a compatibility check before – or even worse, after – a release.

For larger projects, and organisations with multiple projects, this starts to stray into the territory of open source policies and compliance processes, but for this post lets just focus on the basics for small projects.

1. Make it routine

A good strategy is to build good dependency management practices into your general software development practices – similar to the concept of building in quality or building in security.

In other words, given that the cost of fixing things later can be significant, it’s worth investing in the practices and tools that can ensure potential issues are spotted and fixed earlier.

At its simplest, this can just mean developing a greater awareness as an individual developer of where your code comes from,  knowing that what you reuse can limit your choices for how you license and distribute your own code.

So in practical terms, this means being careful about copying and pasting code from the web, and making sure you know the licenses of any dependencies, preferably before working with them, but certainly before building any reliance on them into your code.

It may also make sense to handle any required attribution notices for inclusion in a NOTICE and README as you go along, rather than just rely on a release audit to always pick them up.

2. Let tools take some of the strain

There are also tools that can help make things easier. For example, if you use Maven for Java projects, there is a License Validator plugin that can help flag up problems as part of your compile and build process.

Alternatively, Ninka is an Open Source tool for scanning files for licenses and copyrights. While it can’t follow import declarations or dynamically linked libraries, it can be useful to periodically check builds. A similar project is Apache RAT (Release Audit Tool) which was originally created for use within the Apache Software Foundation for reviewing releases made in the Apache Incubator.

For larger projects and organisations there are also complete open source policy compliance solutions like Protek from Black Duck, or Discovery from OpenLogic.

It’s also worth pointing out that, while tools can be a part of the solution – and can be invaluable for large projects – ultimately it’s still your responsibility to make sure you meet the obligations of the software you are reusing.

3. Remember to check more than just the licences!

If a dependency has a compatible licence, thats great. But what about if the project that distributes it doesn’t bother checking their own dependencies?

This is where it’s good to have an idea about the governance and processes of projects you depend on.

There aren’t just licensing risks associated with dependencies – if you rely heavily on a library that has only one or two developers then you also run the risk that it may become a “zombie” project with implications for the rest of your code, for example, if security patches are no longer being applied.

A zombie

Beware of zombie projects!

The commercial tools mentioned above are also typically backed by a knowledge base that can also flag up other issues with dependencies, such as governance or sustainability problems.  However, just having a check for the project on Ohloh is often good enough for most smaller projects to check that a library is still “live”.

If you need to know more about the sustainability of a particular project, OSS Watch can carry out an Openness Review to check its viability using a range of factors – get in touch with us if you want to know more.

4. Keep track of past decisions and share knowledge with colleagues

Some organisations make use of component registries to keep track of which components they approve on in their software projects. This can save time spent by developers researching the same libraries, but makes most sense when you have a lot of projects that probably need the same kinds of components, in which case focussing on reusing the same set of libraries makes sense.

Another reason for using a registry is where you need to perform more detailed evaluations, for example for security, and so checking a dependency is more involved than just figuring out which license it uses, and that the project isn’t dead.

Some examples of commercial registries are Sonatype Component Lifecycle Management   and Black Duck Code Center. Again, for a smaller project or an organisation with a relatively small set of projects this can be overkill, and just having a shared document somewhere where you can keep note of which libraries you’ve used can be effective.

For example, you could share a spreadsheet with colleagues containing some basic information on each library like what version you’re using, what license it’s under and the date and results of any investigations you’ve done into sustainability, security or risk assessment.

Is it worth it?

Reusing code is good practice and should save you time and expense – so it’s annoying if the administration associated with it starts affecting your productivity.

You can make a judgement call about what level of risk you feel is acceptable; for example, on an internal-only research project the risk of having to undergo a major refactoring should the project be successful may be one worth taking.

However, for a production system, or a component that is itself intended for reuse, you may just have to accept that you have to be a bit more diligent in how you reuse code.

Photo by DieselDemon used under CC-BY-2.0.

Open Source and Open Standards key to future of public sector IT

Last week Open Source, Open Standards 2013 took place in London, an event focussed on the public sector. Naturally these being two topics we’re very keen on here at OSS Watch I went along too.

Overall the key message to take away from the event was just how central to public sector IT strategy these two themes have become, and also how policy is being rapidly turned into practice, everywhere from the NHS to local government.

Tariq Rashid, the Open Source policy lead for the UK Government, spoke of the need for IT to be focussed on user needs, and to deliver sustained value, by moving from “special” software procured for the public sector, to services delivered using commodified IT.

Even where services are unique to the public sector, Rashid and other speakers at the event made the case that most elements of such services can be delivered by building on commodified IT. For example, the open source CMS Drupal is used for delivering increasing numbers of public sector IT services, and the Government Digital Service builds its services from open source components.

The two strategies of Open Source and Open Standards are necessary as they create the ‘competitive tension’ needed to drive down cost and improve sustainability.

Mark Bohannon of Red Hat gave an overview of the global landscape of Open Source in government, in the US and UK, and identified the UK policies as being particularly forward looking. Mark positioned Cloud and Big Data as two key areas where Open Source and Open Standards were critical, calling out OpenStack and Hadoop as particular cases, and also provided some great case studies on open source from the military and from space exploration.

Mark made the point that Open Source and Open Standards underpin a more fundamental change in IT, away from big IT projects towards IT that is agile, modular and responsive to user needs.

Ian Levy of CESG dispelled some myths around security and Open Source (“If anyone in UK government says CESG has banned open source send their name to me and I’ll have them killed”) and made the case for a common sense approach to security, whether the software or service is open source or closed source.

Mark Taylor from Sirius has long been an advocate for open source in the public sector, and it was good to be at a point where the message has been heeded! He began with a nice Schopenhauer quote:

All truth passes through three stages. First, it is ridiculed. Second, it is violently opposed. Third, it is accepted as being self-evident.

In the talk he provided lots of practical advice for public sector organisations on putting Open Source into practice, which include calling on those writing tenders to focus on user needs instead of naming technology solutions. Mark also gave a workshop later in the day where he continued this theme, expanding on how public sector organisations and companies had made transitions to open source. Its not very easy to summarise here in a post, but I found the information very practical and useful; for example, when transitioning IT, to start with the systems furthest away from users, such as backend services and infrastructure, to avoid sparking the usual neophobia when you change technologies for users.

Inderjit Singh gave an overview of the NHS standards-based approach to IT, with some nice background on which approaches had been tried and where the current strategy is going. The current approach has been to use a programme of change projects involving SMEs that have engaged 40 new suppliers, and which is accelerating the take up of the standards.

Singh asserted that standards and fundamental for enabling an open architecture, and that open source and open standards go hand in hand in delivering value for users.

After some workshop sessions, we had Alasdair Mangham from the London borough of Camden giving us a look into how they’ve been building services using open source software in collaboration with SMEs. This involved a major shift in contracting – rather than write an huge set of requirements in a tender document, they disaggregated the project and bought in specialist capabilities (in usability, service design, SOA etc) as needed in smaller chunks of time using an agile process.

Graham Mellin gave an overview of the Met Office’s new space weather system built using open standards and using open source software; for their own specialist systems they decided to go down the route of making it Open Source rather than the private partner sharing route as result of an exploitation planning process.

I met with a lot of people at the event, from suppliers, local government, NHS and national government departments, and it was good to get a sense of how the public sector is moving – whatever the pace in individual areas – towards this vision of more affordable, sustainable and user focussed IT, and better utilising the capabilities of UK SMEs and startups.

We pointed out recently in our post in the Guardian, Higher Education in particular is in a strong position in this area as a result of past investments in Open Source and Open Standards, and we now need to think about how we take that forwards.

As Mark Taylor pointed out in his talk, the public sector accounts for over half of IT spend in the UK – and we can choose to either unite and use that market power to shape the future, or be divided up and conquered.

Shallow versus Deep Differentiation: Do we need more copyleft in the cloud?

In a previous post I discussed two different models for open source services; the “secret source” model, which is based on providing a differentiated offering on top of an open source stack, and a copyleft model using licenses that address the “ASP loophole” such as AGPL.

Another way of looking at these two models is in terms of the level and characteristics of differentiation that they afford.

Shallow versus Deep

Sea and clouds

If a service offering – and this applied whether its a SaaS solution or infrastructure virtualization or anything in between – uses a copyleft license such as AGPL, then this tends to encourage shallow differentiation. By this I mean that the service offered to users by different providers is differentiated in a way that does not involve significant changes to the core codebase. For example, service providers may differentiate on price points, service packages, location, and reputation, while offering the same solution.

There can also be differentiation at the technology level including custom configurations, styling and so on, or added features; however under an AGPL-style license these are also typically distributed under AGPL, so if service providers do want to extend and enhance the codebase, this is contributed back to the community. If a provider really did want to provide deep differentiation, it would effectively have to create a fork.

If a service offering instead builds on top of an open source stack using a permissive license such as the Apache license, then it becomes possible for providers to offer deep differentiation in the services they provide; they are at liberty to make significant changes to the software without contributing this back to the community developing the core codebase. This is because, under the terms of most open source licenses, providing an online service using software is not considered “distribution” of the software.

What does this all mean?

For service providers this presents something of a quandary. On the one hand, a common software base represents a significant cost saving as development effort is pooled, reducing waste. On the other, there is a clear business case for greater differentiation to compete as the market becomes more crowded.

How this is resolved is something of a philosophical question.

It may be that, acting out of self-interest, service providers will over time balance out the issues of differentiation and pooled development regardless of any kind of licensing constraint; the cost savings and reduced risk offered by pooling development effort for the core codebase will be clear and significant, and providers will apply deep differentiation only where there is very clear benefit in doing so, while contributing back to the core codebase for everything else.

Alternatively, service providers may rush to differentiate deeply from the outset, leaving the core codebase starved of resources while each provider focusses on their own enhancements. In this scenario, copyleft licensing would be needed to sustain a critical mass of contributions to the core.

Which is it to be?

Given that OpenStack and Apache CloudStack, two of the main cloud infrastructure-as-a-service projects, are both licensed under the Apache license, we can observe over the coming year or two which seems to be the likely scenario. Under the first model, we should see the developer community and contributions for these projects continue to grow, irrespective of how deeply providers differentiate services based on the software.

Under the second scenario, we should see something rather different, in that the viability of the project should suffer even as the number of providers building services on them grows.

As of now, both projects seem to be growing in terms of contributors; here’s OpenStack (source: Ohloh):

OpenStack contributors per month; rising from 50 in 2011 to 250 in 2013

… and here is CloudStack (source: Ohloh):

CloudStack contributors per month; around 25 over 2011-2012, rising steeply to 60 in 2013

(Both projects have slightly lower numbers of commits, though that can simply reflect greater maturity of codebases rather than reduced viability, which is why I’ve focussed on the number of contributors)

If the concerns over “deep differentiation” turn out to be justified, then community engagement in these two projects should suffer as effort is diverted into differentiated offerings built on them, rather than channelled into contributions back to the core projects.

Is deep differentiation really an issue for cloud?

Deep and shallow differentiation is a concept borrowed from marketing, and is sometimes used to refer to how easy it is for a competitor to copy a service offering. One example of this is the Domino’s Pizza “hot in 30 minutes or its free” service promise – it would be difficult for a competitor to copy this offering without actually changing the nature of its operation to match – it can’t just copy the tagline without risking giving away free pizza and going out of business.

In cloud services, its arguable how much differentiation will be in terms of software functionalities and capabilities, and how much on the operational and marketing aspects of the services: things like pricing, reliability, support, speed, ease of use, ease of payment and so on.

If the key to success in cloud is in amongst the latter, then it really doesn’t matter that most providers use basically the same software, and providers will want to take advantage of having a common, high quality software platform with pooled development costs.

A further problem with deep differentiation in the software stack is that this could impact portability and interoperability – having extra features is great, but cloud customers also value the ability to choose providers and to switch when they need to. Providers focussing on a few popular open source cloud offerings are another kind of standardisation, complementing interoperability standards such as OVF, and one that gives customers confidence that they aren’t being locked in; as well as being able to move to another provider, they also get the option to in-source the solution if they so wish.

Are there better reasons for copyleft?

It remains to be seen whether there really is a problem with the open cloud, and whether copyleft is an answer. Personally I’m not convinced there is.

However, that doesn’t mean copyleft on services isn’t important; on the contrary I think that licenses such as AGPL offer organisations a useful option when looking to make their services open.

Recent examples such as EdX highlight that AGPL is a viable alternative for licensing software that runs services, and that perhaps with greater awareness among service providers we may see more usage of it in future. For example, for the public sector it may offer an appropriate approach for making government shared services open source.

(Sea and cloud photo by Michiel Jelijs licensed under CC-BY 2.0)

MOOC provider EdX goes open source – with an interesting choice of licence

Earlier this month EdX, the nonprofit organisation set up by MIT and Harvard to provide a MOOC platform, released part of its code under an open source licence – the Affero GPL.

MOOCs – “Massive Online Open Courses” – have been hitting the headlines frequently in 2013, with high profile proponents and some big name backers. (For a good overview of the subject, I’d recommend reading MOOCs and Open Education: Implications for Higher Education, a white paper published by CETIS.)

The meaning of “Open” in MOOCs has been variously argued; however the prevailing model is one of open access to higher education, but not necessarily provided using an open platform.

The earliest MOOCs, and those operated by individuals or collectives rather than companies, have tended to operate using combinations of free of charge – though not necessarily open – services such as those offered by Google, WordPress, and Twitter, coordinated using open source course management platforms such as Moodle.

However, most of the high-profile commercially-oriented MOOCs have operated a bespoke online service based on either proprietary software, or solutions built using open source software but not necessarily available for others to replicate due to the so-called privacy loophole; that is, the modifications made to the software to deliver the service are not themselves required to be distributed to users.

The EdX announcement is interesting for two reasons – firstly that they are the first high profile MOOC provider to release open source – and secondly that they are doing so under the AGPL, one of a small number of open source licenses that specifically address the “privacy loophole”. This means that if you create your own MOOC service using EdX’s XBlock software, you must make the source code for the service – include your modifications – available to download under the AGPL. This is a form of “service provider copyleft” that ensures that EdX will have access to any improvements on their platform used by third parties.

This can be seen as a very cautious move – using the AGPL will ensure no other services can improve on the codebase without EdX getting access. However its also quite a bold one as it makes a clear distinction between how EdX sees “open” in contrast with Coursera, Udacity and others. (It remains to be seen what direction the UK’s FutureLearn initiative will take). It will also be interesting to see if other components of EdX will be released, and if so whether they will also use AGPL.

For more information on how online services use open source licenses, see How Can Services Be Open Source? 

The EdX XBlock code is available on Github.

Open Source meets Open Standards

OSS Watch Briefing Paper: Open Standards and Open Source

Open source software and open standards are two of the key interventions in technology policy, whether that policy is made by governments, public sector organisations, or companies.

Open standards can ensure interoperability and assist portability, allowing the switching of solutions and avoiding vendor lock-in. Standards can also help to create new markets, and can also encourage innovation within markets by imposing useful constraints.

Open source software offers benefits of greater flexibility and the potential for reduced development costs and better software quality through collaboration and reuse.

Together, open source and open standards provide the basis for solutions that offer interoperability, cost reduction, and flexibility; no wonder they are seen as such a powerful tool for technology policy!

However, whats often less clear is how the two interact in practice. There is, for example, a fairly widely-held view that open source software is somehow inherently more likely to support open standards. However, in practice this is not necessarily the case, and there are a number of barriers that can actually make it less likely for open source projects to implement standards than their closed-source counterparts.

Open source and open standards should complement one another - but can also counteract each others benefits if policies are developed without paying attention to the way they interact

For example, implementation of a standard requires access to documentation; in many cases this involves payment for access, or paid membership of a consortium – something that open source projects may have difficulty with unless a benefactor or sponsor does this  on their behalf. Also, if a project wishes to publicly claim that it implements a standard, this may involve a formal conformance process requiring paying fees for testing and accreditation.

So for policy makers and CIOs, the selection of standards, and the standards setting organisations they originate from, can have a significant impact on the availability of open source solutions to meet their requirements.

Mandating standards that involve patent licensing fees, mandatory expensive conformance testing and assurance, and restricted access to documentation will exclude many potential solutions and providers. This will have the impact of increasing costs, and potentially eliminating the benefits of standardisation altogether if organisations have little practical prospect of switching suppliers.

Conversely, if standards are selected that provide a low barrier to entry to open source then this can be good not just for individual solution procurement, but for interoperability as a whole.Unlike closed-source solutions, with open source it is possible to inspect the implementation of standards and to conduct independent interoperability and conformance testing rather than rely principally on vendor claims. The presence of open source implementations can also influence uptake of a standard; either by making open source libraries available for use within other products, or by providing a good target for interoperability testing for other entrants.

Open source and open standards are key components in technology policy; but its important to know how they can work together – and potentially work against each other.

A new OSS Watch briefing paper provides an overview of the main issues facing implementation of standards for open source projects and developers; for more information see Open Standards and Open Source.

 

UK Government mandates preferential selection of open source

While the UK government has for some time now been taking measures to level the playing field for open source software in the public sector, for example by pointing out open source options for proprietary systems, the new Government Service Design Manual goes one step further – mandating a preference for open source for government digital services.

An article in Computer Weekly pulls out some of the key paragraphs of the manual, which state that open source should be preferred “in particular for operating systems, networking software, web servers, databases and programming languages” and that proprietary products should only be used in some specific cases – and in those cases to use open standards to avoid lock-in.

A recent article on the new DCMS intranet service exemplifies the new approach to government web services, with the WordPress-based system costing 90% less than the one it replaces.

(Note that the Design Manual applies to creating government digital services, rather than for procuring software in general.)

Rebooting Open Source Projects

Restart ButtonOne of the most annoying things that can happen to software is for it to become “abandonware”, particularly if its something you rely on.

This is where the original developers of the software cease further development or support, usually to move onto other projects, or sometimes due to a change of employment or other personal circumstances. Whatever the reason for it, the software that was once a key part of your infrastructure ceases to be updated, and over time its going to be less and less viable.

However, with Open Source there is sometimes the potential to “reboot” a software project even when the original developers have moved onto other things.

This recently happened to me with a library called HtmlCleaner. HtmlCleaner is a handy little library that, as the name suggests, cleans up “dirty” HTML and allows you to modify it programmatically. There are other libraries for doing this of course, but HtmlCleaner always had the edge for me due to its light footprint, small number of dependencies, and an API that provided the functions I needed. HtmlCleaner is used in Apache Wookie, for example, to clean up HTML in web widgets and inject additional JavaScript needed for extra features.

Unfortunately, the developer of HtmlCleaner, Vladimir Nikic, had moved onto working on other things, and the last release of the library was in 2010. Then recently we came across a bug in HtmlCleaner that was affecting Apache Wookie.

After some Googling I found that others had come across the same problem, and had forked the project to apply their own fixes. Which didn’t really solve the problem, as you then have to ask yourself which of the forks are viable and up to date, and also run the risk of fragmenting the community.  Wouldn’t it be better if we could get those interested enough to make a fork, to take an active role in the core project?

HtmlCleaner already has a large user community, and the project regularly receives new patches and feature requests; all that was missing at a minimum was someone available to thank users for their contributions, review and apply patches, and manage the releases.

I got in touch with Vladimir, and Patrick Moore, creator of one of the forks, and we agreed to “reboot” the HtmlCleaner project, with Patrick and I getting maintainer access.

I applied a user-contributed patch to fix the bug that was vexing Apache Wookie users, and released HtmlCleaner 2.2.1. Patrick is merging back in the various fixes and improvements that he’d made to their fork, to be released as 2.4. After a gap of two years, HtmlCleaner is back in business!

Of course, in another year or two perhaps both Patrick and I will have moved onto other things too. But if we do, hopefully like Vladimir we’ll be able to hand over to new developers coming from within the community.

Some open source foundations and umbrella projects list dormant projects that can be rebooted;  for example, the Apache Attic is where Apache projects head when they are no longer active, and there are processes for allowing a new project team to restart activity, either by moving the project back into the incubator, creating a fork, or forming a new project management committee.

If projects have been dormant for too long, technologies and frameworks may have moved on so far that reviving them is not a viable option, but in some cases there can be mileage in reviving  one of these projects rather than starting from scratch, particularly if there is still a user community to draw upon.

 (image by _bianconero licensed under CC-BY)

A new future for OSS Watch

Following over 10 years of supporting Jisc’s programmes and projects, as a national advisory centre for open source projects, licences, and sustainability, OSS Watch will be moving to a new funding model as of 1st August 2013.

In response to the Wilson review of Jisc, the latter has confirmed that it will continue to support OSS Watch until 31st July 2013 but this will be its last year of core funding. This provides us with a new opportunity to model OSS Watch to meet the needs not only of HEIs but also units outside of HE & FE, including both the public and private sector.

We are currently finalising our new business model; our goal is to expand OSS Watch’s long established portfolio of services but on a chargeable basis. Moreover we are looking at how OSS Watch can be combined with Oxford University’s other activities in the open arena (open education, open access, open data) to present a truly unique offering and centre of expertise.

We would be very interested to hear from you if you are interested in future collaborations, research projects, or other activities; or believe that OSS Watch’s traditional services (advice on open source projects, products, licensing, sustainability, etc) would be of interest to you in the future.

(The original press release for this post is on the OSS Watch website)