Archive

Manage your research data safely with open source

The amount of data that is being generated is still rapidly increasing and both the commercial and the academic sector are working to tackle new challenges that arise from it. These are exciting times for open source projects like Apache Hadoop, a framework that allows for the distributed processing of large data sets across clusters of computers. Many big IT players like Microsoft, IBM, Oracle and Amazon use Hadoop in their offerings.

Academic researchers also continue to generate bigger and bigger data sets. This provides not only challenges for processing the data (something Hadoop can help with), but these data sets need to be managed as well. This involved aspects like version management and longer term curation of the data, to make sure they are and will remain available, just as the scientific publications that were created based on the data.

One exciting project that OSS Watch is currently involved with is DataFlow. This is a project that is tackling the issue of research data management in two stages.

Firstly, there is a software tool called DataStage. In a way this tool works similar to the popular tool Dropbox: researchers can save files to a dedicated location on a network drive, which means it will be stored on a departmental server and the file will be version-managed automatically. As a result, a new version will be created whenever a file is changed and saved onto the drive, which means that the researcher can always go back to a previous version of the file if necessary.

The second stage of data curation is when a file or a set of files is finally used for a publication and the researcher wants the data set to be available for other researchers, or wants to include a DOI reference to the data set. The researcher can then copy the file over to DataBank, an institution-level research data repository.

Both DataStage and DataBank are open source software projects so we welcome potential users and developers to try it out. The projects carry the permissive open source licence MIT. This makes it possible for commercial companies to include the software in a proprietary offering.

Many universities are looking for a solution for Research Data Management and we believe that the software DataFlow is developing are very useful tools that fulfil that requirement. Join us on our mailing list and find out more about this project!

Build a better mousetrap

“Build a better mousetrap, and the world will beat a path to your door” as Wikipedia informs me Ralph Waldo Emerson never quite said. The point – that real innovation sells itself – remains true today. Indeed it could be argued that the average consumer is more engaged with the heartbeat of technological innovation now than ever before, with software releases making headlines among the more traditional stories of war and celebrity.

Emerson’s non-quote does raise a question, however. How do we identify technology which is better? With mouse-traps there are some fairly obvious metrics relating to mouse mortality and cheese preservation, but not all inventions are as easy to benchmark. The last few weeks have seen anouncements of upgrades to the world’s two most commonly used smartphone operating systems: Apple’s iOS (version 5) and Google’s Android (version 4). Each brings a raft of new features, although in both cases it has to be said that these new features are no longer as core to the operation of the device as innovations in earlier versions. Voice-operated search and facial recognition are nice, but hardly essential elements of a mobile computer, at least for now. Perhaps lost in the combative comparisons deployed by proponents of each OS is the fact that a genuinely key ability – web browsing – is implemented on both platforms using essentially the same code: the Web Kit open source project. While newer functionality is added by Google and Apple to differentiate the competing products, it pays them to cooperate on key, unavoidable elements of their offerings. Given this, it’s fair to repeat the question – how do we identify real innovation? The newer differentiating features appear to be the cutting edge of endeavour, but their very newness is a demonstration that – up to now at least – they have not been essential elements of the technology in question. Some of them will die away despite their novelty, having never truly improved the invention that they embellish. Like a cheese grater on your mouse trap, it’s possibly a nice idea and undoubtedly novel, but how useful is it really? Only time will tell, and in the meantime better springs, and better browsers, are being developed.

So perhaps the question needs to be: “looking back at innovations that have proved to be key, how do they tend to develop?” Using the answer to this, we might be able to form some techniques for looking at our cutting-edge-but-possibly-pointless innovations and making guesses about their eventual utility. We might even be able to identify over-arching strategies for conducting and rewarding innovation…

Here we get into an argument that flared up earlier this month, when a video of Francis Gurry, the Director General of the UN’s World Intellectual Property Organization (WIPO) back in June was discovered by the internet commentating community. Gurry was speaking to sum up his views on a debate which had just taken place on ‘Accelerating Growth and Development’ in relation to invention and intellectual property. Gurry’s argument was seemingly  summed up by the headline on the BoingBoing article which drew it to the internet’s attention: “WIPO boss: the Web would have been better if it was patented and its users had to pay license fees”. Reading the article, though, even the quote that BoingBoing had pulled failed to use that emotive word ‘better’:

Intellectual property is a very flexible instrument. So, for example, had the world wide web been able to be patented, and I think that is a question in itself, perhaps the amount of investment that has gone into or would be able to go into basic science would be different. If you had found a very flexible licensing model, in which the burden for the innovation of the world wide web had been shared across the whole user community in a very fair and reasonable manner, with a modest contribution for everyone for this wonderful innovation, it would have enabled enormous investment in turn in further basic research. And that is the sort of flexibility that is built into the intellectual property system. It is not a rigid system.

Reaction to the video from proponents of open content and open source across the internet was voluble and aggravated. Gurry was accused of being ideologically indoctrinated and blinkered, tied to anachronistic models of IP registration and exploitation even in the face of the incredible growth and success of the web largely without the intervention of these models. In fact though, the most that Gurry says is that the web would have been ‘different’. Taken in the context of the statements which preceded it (and which you can hear by downloading the video), in which the value of the traditional IP systems had been questioned repeatedly, Gurry’s statements do not really support the distillation they were given, and which caused so much anger. He is trying to argue that the web could have grown within more traditional licensing structures. Whether he is right about this or not, he is not claiming here that it would have been ‘better’ under those circumstances.

The anger and confusion here are natural, though. The battle lines between proponents of the traditional and the more ‘open’ approaches to innovation (and here we should note that the buzz phrase ‘open innovation’ often itself refers to deeply traditional IP exploitation patterns) have long been drawn, and the forces on both sides are keen to tackle and destroy the arguments of their opponents wherever they see them. The web is often perceived  - with much justification – as a triumph of innovation outside the traditional IP exploitation framework. To hear someone perceived as being part of the old-guard even discussing it can seem presumptuous to some ears. Yet in reality the implied dichotomy here is simplistic. The open licensing movements themselves are underpinned by the arcane operations of traditional licensing and exploitation. While they may give these operations an innovative twist, they could not be enforced or defended without them. Conversely, Gurry’s example of why  the patent regime is beneficial fails to address the criticisms of openness proponents. He points to the publication framework implicit in the current patent system, and makes the comparison between the saxophone – which has fully documented design documents available thanks to its having been patented – and the violin – where many secrets of producing the greatest instruments have been lost through secrecy and the passage of time. This critique – while interesting – is almost wholly inappropriate as a defence of the current system in opposition to more open models. In the modern case, both models involve complete publication – the distinction lies in how benefits are reaped from exploitation and by whom.

Given the frequent failures of either side in this debate to engage with what the other is actually saying – illustrated by this sad tale –  it’s not surprising that telling which innovations are better remains hard. While ideology is important, it can often obscure our view of what actually matters to most people: how many mice are killed (or indeed captured).

Meritocratic project governance in action

At OSS Watch we work with projects that develop software within and for academia. When these projects use the open development approach, they have the opportunity to engage third parties with the project and receive contributions from people with an interest in the project. In order to be able to do that, the project needs a governance model that makes it clear for external parties how to engage with the project. An example of how such a governance model would work in practice is the Cocoon project at the Apache Software Foundation (ASF), which is a classic example of how a meritocratic governance model works in practice. In this post, I will highlight bits of a case study that Andrew Savory has written about the project. It will demonstrate how the project governance at the ASF is organised and can help you when considering a governance model for your own project.

Continue reading ‘Meritocratic project governance in action’

SaaS – Who shares wins?

Fairly often we are asked by projects we advise whether they can make an open source release of their code while controlling whether others can make money from it. In fact, we get asked it so often that it now has a place in our FAQ document:

3.8. Can I restrain commercial reuse of my code using an open source licence?

Not directly, no. Point 6 of the Open Source Definition states that a pre-requisite for any licence to be considered open source is that it should make ‘No Discrimination Against Fields of Endeavor’. Commercial exploitation is a field of endeavour.

Having said this, some open source licences make code that they cover less appealing for commercial entities to include in their products. In general, licences with some element of ‘copyleft’ such as the GNU GPLv2 or the Mozilla Public License will compel commercial reusers to make some or all of their own code open source. Permissive licences such as the BSD License and the Apache License v2 on the other hand allow the code they cover to be added to a closed source project without compelling any other code to be open source.

Recently we received a novel variation on this query, and I have adapted my answer to be a post for this blog. It’s not yet a frequently asked question, but it may well become so. The questioner had written some code which implements a network service, and wanted to know if they could make an open source release while simultaneously discouraging others from selling hosted services based upon the code…

There are free and open source software (FOSS) licences which make conditions about this kind of use of the software that they cover. The Open Software License v3 and the GNU Affero GPL v3 are examples. There is a brief explanation of this kind of licence condition on the OSS Watch site.

All that these licences require, however, is that licensees who deliver functionality of the software over the network must make the source code to their version of the code available to users. This ensures that improvements to the software are released back to the community, but it may not act as a strong disincentive to those providing the service; after all, it does not compromise their core offering of network services. Contrast this with the more traditional model of distributing the software itself, where basing a product on (for example) GPL v2 code would mean that your own product must also be GPL v2 licensed, compromising the ability to charge for licences;  after all, customers could onwardly distribute the product to other potential customers, undermining the business model.

The second – more traditional – example there is the basis for the common dual licensing FOSS business model. The project provides a ’community’ version under a copyleft FOSS licence and a commercial licence for businesses wanting to build a product on the code base for which they charge a licence fee. The model works because the copyleft licence has conditions that the business will pay to avoid. In the ‘application service provider’ model we are discussing here however, the condition that source code must be provided may well not be onerous enough for the application service provider to pay to avoid, particularly if they have not invested in improving the code in any way.

The upshot of this is that – if one wants to restrain competing businesses from using the software to provide a network service – a FOSS licence may not be the optimal choice. Having said that, it might be worth examining the motivation behind restraining competitive network service provision in the first place. In the worst case, where the competitor uses the code unmodified and therefore sacrifices nothing by making that source available freely, they are also necessarily behind the project maintainers in terms of consideration of improvements and new features. Given a choice between buying hosting from the originators of the software and a copy-cat business which makes no contribution to the code base, customers may well choose the former. Conversely, if the competing business is actively improving the code and making those improvements available as a result of the conditions of the licence for the initial project, this could be seen as a positive outcome that benefits the entire ecosystem and leaves businesses to compete on quality of service rather than code quality.

Cross-sector engagement at TransferSummit

Many inspiring conversations made the TransferSummit conference, last week at Keble College in Oxford, a great success. The conference was chaired by OSS Watch alumnus Ross Gardler, now OpenDirective, who was responsible for putting together a excellent programme with a diverse range of sessions on open innovation in software. OSS Watch is a Silver Sponsor of TransferSummit and co-organised the conference. The programme was carefully crafted and the sessions followed on each other very nicely, telling the story of how open innovation can be successfully applied to software. We set up virtual tracks on specific topics to help delegates find their way through all the sessions. Still, an often-heard comment was that attendees found it difficult to choose which session out of the three tracks to attend, because so many appealed to them.

An addition to the overall programme this year was the Open Gadget Playtme. This showcased a number of open source hardware, or innovative hardware running open source software. For example, a RepRap printer was shown in action, an open source 3D printer that can print many of its own parts. Another example is the Rasperry Pi project, who aim to develop an ARM Linux box for $25, specifically aimed for use in teaching computer programming to children. This led to many interesting discussions about other applications of this technology, for example as part of a media streaming network.

Sessions were being presented by experts from diverse backgrounds, ranging from the academic sector, the business sector, and the public sector. This led to many interesting conversations about sharing experiences and the challenges in open innovation.

One compelling example of the kind of cross-sector discussions and engagement that TransferSummit facilitated was found in a session about the Apache Rave (Incubating) project . Matt Franklink of Mitre, with help from Ate Douma of Hippo, gave an presentation on how the project came about as a result of an initiative from six organisations, combining three code bases. In the audience was John Pybus of Oxford’s OeRC, who are working on the Bamboo project, an international collaborative research project that is developing tools and infrastructure for humanities researchers. As part of that project they are developing a Virtual Research Environment (VRE) platform, and they are already working together with one of the project partners on the Rave project. However, there is a compelling argument to move from reusing their isolated code base towards using and engaging with the Rave project. That would enable them to benefit from the programming efforts put in by all organisations involved. For Bamboo, it is important to know that they can build on a code base that will be sufficiently mature and supports the main features they need to build on. For Rave, it is an excellent opportunity to have another academic partner be involved with the project and with Bamboo’s input they can support a wider range of use cases.

This conversation and many others started at TransferSummit will hopefully lead to interesting new fruitful collaborations. OSS Watch will continue to play a facilitating role to connect people and projects relevant to the Higher Education sector to help foster open innovation. Do let us know about inspiring conversations you had at TransferSummit in the comments.

Open Source Junction 2 – video feedback

We asked a few speakers and participants at Open Source Junction 2 about the benefits they saw in building a community of industry and academic folks interested in mobile technologies.

The answers varied, but people seemed to perceive the diversity of points of view involved in this cross-cultural exercise as beneficial. The delegates with an industry background praised the deep thinking and creativity of researchers, while academics were impressed with the down-to-earth revenue-generating mindset of their industry colleagues.

The early feedback we got from the first event of the series was equally positive.

Open Source Junction 3, due later this year, will continue to facilitate the encounter between the academic and business cultures focused on open development in the mobile sector.

Watch this space.

———————————-

Camille Baldock, Softwire

YouTube Preview Image

—————————————————–

Steven Gray, University College London

YouTube Preview Image

————————

Julian Harty, ebay

YouTube Preview Image

——————————

Samuel Carlyle, Sukey

YouTube Preview Image

————————————

Nick Allott, NquiringMinds

YouTube Preview Image

———————————————-

Tim Fernando, University of Oxford

YouTube Preview Image

————————————————

Philipp Breuss-Schneeweis, Wikitude

YouTube Preview Image

——————————————

Yuwei Lin, University of Salford

YouTube Preview Image

————————–

Dave Raggett, W3C

YouTube Preview Image

—————————————

Gabriel Hanganu, OSS Watch

YouTube Preview Image

Standard alchemy

One of the questions that comes up over and over again when we talk to software authors about licence choice is: “what is each type of licence especially good for?” Personally, when discussing permissive licences I tend to point to the imperative of pushing an open data standard as a prototypical use case. Luckily for me, computer graphics industry giants (and competitors) Lucasfilm and Sony Pictures Imageworks have just released v1.0 of their 3d scene format Alembic to prove the point.

Anyone who has dabbled in 3d computer graphics, perhaps using the FOSS modeller and renderer Blender, will have noticed just how many competing object and scene description formats there are out there. While this is a minor inconvenience for amateurs like me, it becomes a massive and expensive problem for visual effects production houses and their clients. Complex animated objects and scenes created in one application have to be laboriously exported frame by frame and reconstructed in the target application, often requiring duplicated effort to complete the import. Alembic avoids this by allowing the compression and exportation and importation of entire complex animated sequences.

So for CG professionals there is a clear need for something like Alembic (the name is taken from the piece of chemistry glassware which distills products and delivers them to another container). However for the creators of the individual software packages the idea of a simple interchange format is not a high priority; after all, why should they make it easier for clients to use another company’s software? It’s natural, therefore, that it is the clients themselves who have produced the standard and software which ’scratches this itch’. Indeed there’s a precedent in the same industry. Back in 2003 Industrial Light and Magic release OpenEXR, a data standard and implementing software that allowed the interchange of so-called ‘HDR’ images (images with a greater range of luminance than standard image formats can store). In eight years OpenEXR has become a widely-used standard. With any luck, Alembic will follow in its footsteps.

Top 10 IP and licensing tips when licensing open data and open content

This guest post has been contributed by Naomi Korn and is based on a series of 10 Minute Blog entries that Naomi has written for the JISC funded OER IPR Support project, for which she is the Project Director. Naomi is the co-author, together with Charles Oppenheim of Licensing Open Data: A Practical Guide.

Editor’s note: This post addresses IP issues surrounding open data and open content rather than open source software. Whilst open data and content is outside OSS Watch’s remit it is, of course, pertinent to the world of open source software and we welcome Naomi’s thoughts and expertise.

1.    Identify the IPR and other legal issues which maybe associated with the data and content you wish to license. For example, even if there are no underlying IPR issues in your data and content, you may be constrained by contractual terms and conditions underpinning the supply of data etc. from third parties to you. You can read more about this at http://www.jisclegal.ac.uk/Projects/TransferandUseofBibliographicRecords.aspx

2. Don’t forget to identify all the layers of rights. There may be more than one layer of copyright materials, other types of IP (such as Performers’ rights) as well as other legal issues (such as Data Protection etc) which will need identifying and managing.

3. Decide how ‘open’ you wish to license your data and content. Issues that may need to be addressed include: – controlling use for non commercial uses only vs. allowing commercial exploitation by third parties and encouraging BCE etc – requirements for attribution vs. the resulting possibility of attribution stacking – controlling reuse and repurposing but sacrificing potential interoperability when blending with content, data as well as software licensed under more open terms.

4. Remember that the more ‘open’ the use and repurposing of your content and data, the greater the risk if you have not cleared all the rights.  This is particularly pertinent for in copyright materials for which the rights holders are either unknown or cannot be traced (so called ‘orphan works’). In these situations, the OER IPR Support Risk Management Calculator can be used to establish an indicative risk score which can be used to help inform decisions relating to risk management.

5. Risk Management is increasingly important in the provision of access to open content where it may not be clear who created what and who owns what rights (if any). An organisation’s relationship to risk management should be supported from the bottom-up, by a realistic understanding of the nature of the work and its proposed use, and by the top-down recognition that an organisation’s understanding and acceptance of necessary risks, needs to be agreed, captured in policies and where possible, mitigated. This is an important component in the development of an appropriate corporate governance framework to support the delivery of open content and open data.

6. Consider how the licensing of your data and content relates to the licensing of other types of materials such as open source software, and whether one broader licence, such as the Open Government Licence which covers data, software, content etc might be more beneficial than multiple licences.

7. Clear permissions with any third parties (as per 1 above), making sure that permissions that are sought are either the same or more than the permissions that you then grant under your selected open licence – never less! The support video profiled on the OER IPR Support webpages can provide more insight about this issue.

8. Remember, open licences are often irrevocable, global and in perpetuity, so make sure that you are happy with what you intend to do with your data and content before you licence it out. Worst case, openly licensed resources can be removed from the web etc., but permissions granted up to that point cannot be revoked.

9. Get permissions in writing, such as emails etc from any third party rights holders. Verbal permission is not adequate.

10. Extract key information relating to third party permissions and store in a suitable system which is centrally accessible to prevent the ’siloing’ of core rights management information. This is particularly important if projects are funded for a specific period time, such as JISC Projects, but where the permissions to use the materials may be subject to certain limitations and/or crediting requirements etc, as well as ensuring that there is a place to record rights holders contact details in case further contact is required.

“Recipe for Rip-Offs”

Here in the UK the Public Administration Select Committee has been looking into the poor record government has in procuring IT systems. The title of their report “Government and IT- “A Recipe For Rip-Offs”: Time For A New Approach“ serves as a neat summary of the content. Stating the problem, the report says

The UK has been described as “a world leader in ineffective IT schemes for government“. There have been a number of high cost IT initiatives which have run late, under-performed or failed over the last 20 years including: the Child Support Agency’s IT system, the IT system that would have underpinned the National ID Card scheme, the Defence Information Infrastructure Programme, the implementation of the Single Payments Scheme by the Rural Payments Agency, and the National Offender Management System (C-Nomis).

The main problem, the report says, is that the Government does not have the internal skills to specify and procure IT systems. As a result they tend to rely on large external contractors to manage the process of developing IT systems (and to subcontract to smaller businesses where necessary) . Naturally this involves handing over very large amounts of both cash and power to the ‘head’ contractors, and it is this complete externalisation of the ‘IT customer’ function that the report points to as the key failing in previous large government IT procurements. The answer, therefore, is to get better IT management skills within departments and take on the management of the smaller subcontractors themselves.

This is not the only failing identified. It seems that Government also tends to ‘gold-plate’ (over-specify) security requirements even on systems that do not require it. The report also criticises the tendency to see IT projects as a distinct kind of problem rather than an exercise in change management like any other. Nevertheless, it is the ‘externalisation’ problem which looms largest in the report’s somewhat gloomy findings, and it is in this context that the issue of open source arises.

Early on the report identifies the creation of ‘a level playing field for open source software’ as one of the approaches to solving the problem of Government IT that had already been suggested. In the recommendations, we find that open source is mentioned in the context of providing an open data platform for Government-held data which could be developed upon by third parties to provide analysis and manipulation applications. While both of these suggestions are sound in themselves, I think it is in the core recommendation that we can see the best opportunity to realise value for the UK taxpayer from open source software and development.

While there are very large scale corporations offering open source solutions, the majority of bidders for Government IT contracts offer closed source solutions, often with the bidder themselves retaining ownership of the IPR in the resultant code and licensing it under very restrictive terms. If the current reforms succeed in getting departments to break down IT procurements into smaller interoperating sections and invite bids for these from smaller, more agile developers, the opportunity for existing successful open source projects to be the bases for Government IT solutions expands. Assuming that the newly-acquired IT experts within departments are able to meaningfully engage with the communities around these projects – both through their hired developers and as users themselves – then huge amounts of value in terms of code, user requirements and expertise which are currently locked into closed, non-functioning projects will be available for the good of the community at large. The projects themselves will learn how to interact with Government clients, and software components of general application will find their way back into the public space to benefit other large-scale users.

All of these benefits, though, depend both on an openness to the use of open source software but also on expertise in managing the relationship with that software’s community. So while I welcome heartily the proposal that Government acquire the IT skills to take a hands-on role in managing their IT procurements, I hope that those IT skills will include expertise in exploiting the unique benefits of joining an open source community.

Open source innovation

In June 2010 I chaired the inaugral TransferSummit Conference (report) in Oxford. The byline for that conference was “Open Source, Open Development, Open Innovation”. In September this year we are running TransferSummit again. The tag line this year is “Open Innovation Everywhere.” I’m proud of the conference we have created.

Jay Lyman, Senior Analyst at The 451 Group recently twittered that the “overwhelming message @ Oscon so far is open source now driven mostly by innovation.” This in turn prompted Brian Proffit of IT World wrote:

if you put yourself in the right mindset, this kind of innovation is easier to spot than ever. Open source isn’t “just” the alternative anymore, it’s becoming the mainstream technology in a lot of areas.

When we created TransferSummit the objective was to highlight the fact that open source is open innovation in software. However, unlike most open source related conferences we are not interestedin preaching to the converted. Open source and the innovation it brings is important to everyone and TransferSummit is designed for everyone, no matter their existing level of open source knowledge.

TransferSummit 2011 has the same goal and has an equally impressive range of speakers for you to engage with and share ideas. We have:

  • 2 days’ trainings
  • 2-day conference across three tracks
  • Mobile focus, virtual track
  • Government summit
  • Hands-on learning and break-out sessions
  • Gadget playtime
  • Gala dinner
  • On-site housing
  • Intimate setting, limited to 300 participants

This is the must attend event for 2011, tickets are selling fact so register now.