Format? Y/N

Yesterday Martin Hawksey of JISC CETIS published an interesting blog post on the ugly problem of openly-licensed content wrapped in closed file formats. In that post Martin writes:

PSD is a proprietary file format developed and owned by Adobe and used in Photoshop. You can actually open and edit PSD files in open source tools like GIMP (I’m not sure how legally Gimp can do this…

…and goes on to  note that he has asked OSS Watch to comment. I wrote back a quick email pointing to some issues at the levels of copyright, patent rights and trademarks, but with the headline summary: ‘it can be awkward on all levels’. In this post I propose to detail some of the issues that I as a non-lawyer can see with open source software authors implementing import and export of proprietary file formats.

Starting with copyright, the software whose functionality you are reproducing is very likely protected by copyright. It is also possible that the file format itself is a protected work. In the case of the program, we have a good reason for saying that we are not infringing its copyright by reproducing its file import/export functionality. The European Court of Justice’s ruling on some questions raised by the SAS v WPL case tells us that:

 “neither the functionality of a computer program nor the programming language and the format of data files used in a computer program in order to exploit certain of its functions constitute a form of expression of that program for the purposes of Article 1(2) of Directive 91/250.” (para 39)

There is also a limited copyright exception at EU level covering acts undertaken

 to obtain the necessary information to achieve the interoperability of an independently created program with other programs. (point 15)

under the Directive on the legal protection of computer programs (2009/24/EC). However this would not cover the distribution of a program that would otherwise be infringing, just the copying etc of the original program required to hack it about and get the information.  The directive also notes that:

 An objective of this exception is to make it possible to connect all components of a computer system, including those of different manufacturers, so that they can work together. Such an exception to the author’s exclusive rights may not be used in a way which prejudices the legitimate interests of the rightholder or which conflicts with a normal exploitation of the program.

…which quite conceivably makes it unusable if your reverse engineering is aimed at producing a competing product, as opposed to just doing some systems integration plumbing. So in fact the reverse engineering exception is very narrow and likely does not cover the kind of work we are discussing here.

While the SAS v WPL ruling probably protects us from being accused of infringing the original program’s copyright, it does (rather worryingly) also note:

45. The Court also points out that the finding made in paragraph 39 of the present judgment cannot affect the possibility that the SAS language and the format of SAS Institute’s data files might be protected, as works, by copyright under Directive 2001/29 if they are their author’s own intellectual creation (see Bezpečnostní softwarová asociace, paragraphs 44 to 46).

So if the file format itself is a separate protected work, we might be infringing its copyright by writing and distributing code that creates similar works. How could this be? Well let’s take an exaggerated example. If I wrote a program whose data file format required that chunks of data be separated by the lines of an original sonnet I had written, then any compatible file would have to contain a copy of my sonnet, as would the code that creates the file. In real life of course things tend to be different: file formats are largely defined by structure, not content, and I can’t write sonnets. We have some reason to believe that structure in computer-based works is not in itself copyrightable; the decision in Oracle v Google, while not directly applicable here, and made by a US court, points in that direction. Overall though, because we can conceive of a file format which is a protected work under copyright, we have to acknowledge that – despite encouraging developments in the courts – we might still have a problem at the level of copyright.

Patents are easier to consider, in that they are definitely a problem. Any code which implements a patented process will require a licence. We have solid examples of successful enforcement of patents against software engineers writing import export functionality. I’m sure every internet veteran remembers the GIF Wars associated with Unisys’ successful enforcement of their patent on LZW compression (pdf link) which was integral to the GIF image format. More recently Microsoft was successfully brought into compliance with i4i’s patent on methods for XML processing that were present in versions of Microsoft Office from v2003 onward.  The latter example is interesting because the patent was not actually integral to the file format, but dealt with the particular way Microsoft had chosen to create and manipulate Office files in their software. Open Office did not use the same method, and was therefore able to generate files with the same content without requiring a licence from i4i. It is also interesting, and troubling, because it highlights the fact that at the level of patents, it is not just the authors of the original program you have to worry about.

Finally there is the issue of brands and trademarks. Some file formats have associated brands. Pantone Inc’s Hexachrome technology, for example, is protected both as a patented technology and as a registered trademark. Even if the patent were found to be invalid, someone writing software that imported or exported Hexachrome data would need a licence from Pantone Inc to describe their software as Hexachrome-compatible.

So for all these reasons I think my statement that  ‘it can be awkward on all levels’ is regrettably justified. The file format IP issue we have discussed here  is closely adjacent to the perennial question of what makes a standard ‘open’. As the Cabinet Office discovered this year, it’s a question with many answers, depending on which interest group you ask. Anyone interested in these issues might want to attend the European Commission’s workshop on open source and FRAND (the ‘fair reasonable and non-discriminatory terms’ under which patents associated with recognised standards are often required to be licensed) next month.