Opportunities for scientific research in open source projects

There are many interesting open source projects that can be beneficial to academic research. As OSS Watch’s recent article on e-Research by Gabriel Hanganu shows there are social and organisational problems in adopting open source for e-Research, but there are many open source software projects there to be joined. Some projects are suited very well to be used in scientific research and I feel that this is especially true in the realm of big data databases.

Google showed the way, really, with the MapReduce paper in 2004. They published their programming model for processing large amounts of data in parallel and although publishing it, they did not neglect to apply for a patent as well, which was recently granted. Hadoop, which originates from a project at Yahoo!, also implements the MapReduce pattern, but is completely open source being a project of the Apache Software Foundation. And now recently Apache Cassandra has joined the mix. Cassandra originates from Facebook, but has become open source in July 2008. It recently promoted from the Apache Incubator and is now an official top-level Apache project.
Work has been initiated to facilitate integration between Cassandra and Hadoop, which simplified means the Hadoop database HBase is replaced with Cassandra. There has been discussion of this on the list and a feature has recently been implemented. So there’s Yahoo! working on Hadoop and Facebook working on Cassandra, and recently also Twitter has announced that it is working towards using Cassandra for their backend. Also worth mentioning is the open source implementation of Amazon’s Dynamo database which is named Voldemort. This project is used and actively developed by LinkedIn and is therefore another example of how you can benefit from the work this large company is investing by engaging with this project.

To me, this all shows that there will be large investments in NoSQL databases from major companies in the coming years, and it will all be in open source software. This means that there is a lot of opportunity for anybody who has to deal with big data to profit from this investment. All you have to is try out the software and engage with these projects. Researchers also have to cope with more and more data, so I think they have good reason to follow these developments closely and step in to benefit.

1 Response to “Opportunities for scientific research in open source projects”


  1. 1 Interdisciplinary Studies of Open Source Software (OSS) Projects

Leave a Reply