ICANN to start Internationalised Domain Name testing

The ICANN plan to roll out domain names non-western scripts is about to release trial top-level domain names using the world ‘test’ translated into Arabic, Persian, Chinese (simplified), Chinese (traditional), Russian, Hindi, Greek, Korean, Yiddish, Japanese and Tamil. The top level trial domains will be retired once production domains in these scripts are rolled out:

It is planned that the .test labels will be kept in the DNS root zone and resolving with example
positioned at the second level (i.e., translations of example.test) until registrations in a
corresponding script are available in a production environment. Although it is anticipated that
the evaluation facility will be of short-term utility the lifespan of the evaluation may be
extended if it is demonstrated that target groups will derive continuing benefit from it.

I’m expecting that many, many, applications which touch DNS are going to have to be patch to fix bugs that will be shown up by this change, so expect a wave of patches and updates 2-6 months from now. Well behaved applications which handle DNS by calling system libraries are likely to be OK, since this change have been in the wind for ~3 years and the maintainers of specialist libraries should be prepared for it.

I’ve previously written about ICANN’s failure to move on this and other topics, but it looks like I may have been too quick to criticise them.

Communities can’t flourish in walled gardens

I recently posted on the dangers of using closed Social Networking sites to develop community. Since writing that post I have entered into numerous discussion about my position. These discussions have taken place in both “walled garden” tools as well as within open communities such as mailing lists, direct emails and the blogsphere. In that time I have listened to, and learnt from, many different views and I’ve started to come to the conclusion that, well, I’m only partially right to when I say:

I predict only one or two of the current Social Networking sites will survive, and they will be the ones that share their network data first.

Randy Metcalfe pointed me at a BBC story in which Michael Geist agrees with my “sharing data” point:

 The better approach – for users and the sites themselves – would be to work towards a world of interoperable social networking.

However, Micheal disagrees, like many others, with my claim that only one or two Social Networking sites will survive:

Some services may believe that it is in their economic interest to stick to a walled garden approach; however, given the global divisions within the social networking world, the mix of language, user preferences, and network effects, it is unlikely that one or two services will capture the global marketplace.

I agree with Michael, and others. There is lots of room out there for niche players. What we need is open standards for creating interoperable networks.

I recently started reviewing open source social networking tools and the standards they adopt. This will be published as an OSS Watch briefing paper sometime in the next couple of months. Please let me know, via your comments, of any social networking/news/bookmark tools you think I should look at.

Depositing documents in repositories: Which repositories should we use?

Recently during a discussion with Pete Cliff from RSP, the question arose “which repository, if any, should OSS Watch be putting our documents in?” The possible answers were:

  1. don’t put them anywhere
  2. put them in an institutional repository
  3. put them in a subject-specific repository
  4. put them in a funder-specific repository
  5. put them in a creative commons repository
  6. some combination of the above

I’ll confess straight-off that I’m enough of a bibliophile and library-lover that first option doesn’t appeal, besides our entire remit at OSS Watch is dissemination based, and it makes no sense to hide our outputs under a bushel. Repositories are to open access what version control systems are to open source; it’s very hard to argue against their use except where copyright is unclear, which is generally a sign of larger problems.

Institutional repositories are something I’m less than confident about.

Recently I completed my PhD in Computer Science at the University of Waikato, and ingest of my thesis into the institutionally supported “Australasian Digital Theses Program” repository was a seamless part of the submission and degree granting process. It just worked, the only slight wrinkle was that I wrote in LaTeX rather than Word as the documented submission process assumed. I have confidence that in a hundred years a copy of my thesis will still be in existence (I have less confidence it will be of interest, but never mind).

Once I was awarded my degree I also deposited the thesis with my employers institutional repository (Oxford Eprints), primarily to increase the chance that future Research Assessment Exercise-type activities would also “just work.” Several months after depositing with Oxford Eprints, I got an email from OULS, saying that they were migrating from Oxford Eprints to Oxford University Research Archive, and that my deposit did not meet the collections policy for the new archive, so it would be dropped. Presumably the new archive either isn’t going to be used for RAE-type work, or PhD theses are not considered research. Dropping works from an archive seems like a crazy policy to me, since I always understood the difference between a library and an archive was that an archive didn’t drop deposits after ingest.

Thus I have very mixed feelings about institutional repositories.

Subject-specific repositories can also be very effective. arXiv.org is the canonical example here. It’s a huge repository of peer-review papers in physical and mathematical sciences built to overcome the problem that in hot topics research is moving faster than journals can be published on paper. arXiv.org has been around for more than ten years and remains well-supported, well-trusted and well-used. The problem I have with subject-specific repositories is that they are insular, they further entrench the two cultures and act as barriers to communication.

I strongly believe that the currently well-funded physical sciences would benefit from a little cross subsidising of the infrastructure of currently less well-funded fields. Look at Middle-Eastern studies in western universities before and after 11 September 2001. Literally overnight (at least for those of us down under at the time), this was a field became of huge popular, political and academic interest. Shared infrastructure enables hot topics to scale up rapidly. Let’s not forget, either, that during the western European dark ages (when Galileo Galilei was on the Index Librorum Prohibitorum) it was the Middle-East that kept alive the foundations of those currently well-funded physical sciences, to the extent that the field of Algebra is named after the work of Muhammad ibn M?s? al-Khw?rizm? who worked in Baghdad, and the text of “On Divisions of Figures” by Euclid is known only by re-translation from the Arabic.

There has been recent discussion of JISC setting up a repository for outputs of work they fund. This would allow them to ensure that materials they fund can be found and promoted as appropriate. It would also greatly enable enforcement of the rules about making outputs available to the UK higher and further education. For the fundees, a repository would enable them to quickly and easily provide evidence of the project outputs. In the medium to long-term I have my doubts about the such a repository, however. While I have no doubt that Oxford University and the field of physics will be around in 100 years, I’m not confident that the JISC will be. Unless it’s clear where the repository will be in 100 years, I’m a little hesitant. Having it hosted by research institution which is paid up-front to preserve them in perpetuity would be adequate. The JISC already funds Jorum, a repository of learning content, but my understanding is that unlike research outputs, learning content is not expected to be archived in perpetuity: it’s a library rather than an archive.

There are a number of archives using creative commons licences (and one or two others) as a political message. These have a particular resonance with the open archives, open source and open content movements, in that actively work to promote content reuse, thoughtful use of copyright and build communities around shared content. While I have sympathies with these political messages, the archives are typically of uncertain long-term sustainability. In a hundred years the political message is unlikely to be relevant, and while some may adapt and evolve, many are likely to wither and vanish. If we’re lucky some will get absorbed into institutions which already have long-term sustainability.

Placing a document in multiple repositories has benefits. You potentially get the advantages of each of the repositories at the cost of redepositing the document. By creating two sets of metadata describing a document during deposit, there is the potential to later use that metadata for cross-walking the repositories (this only works, of course, if the document has a unique id across the repositories). But by diluting the unique holdings of a repository, you make it a less attractive target for preservation funding or to be absorbed into larger repositories. Automated ingest from one repository to another or deposit into two repositories with the same metadata schema doesn’t work either, since really what you’re building is a mapping from one metadata schema to another based on instances which have been human-classified.

There are a number of technical and consortia-based approaches which solve many of these problems. The US, UK or EU science funding bodies may decided that explicit direct funding for repositories with scope to match that of research libraries is beneficial in the short to medium term. Automated OAI-PMH techniques may evolve to the point where researchers expect full local mirrors of all significant archives globally (consider the UK mirror of arXiv.org). The large science publishers may release that the on-coming light is not the end of the tunnel. Copyright law may get completely rewritten. But none of these are directly relevant to the question of which repository should we be putting our documents in right now.

Ideas anyone?

Simal to release version 0.1

Simal, an open source project using DOAP/RDF files to reflect and build software-oriented communities, has a release candidate for their 0.1 release and looks like we’re about to get it out the door. The idea is that Simal takes RDF-encoded descriptions of projects and uses them to build small-scale community functionality. Much of the available RDF-encoded information about projects is actually in RSS feeds (think blog feeds, version control feeds, wiki change feeds, mailing list feeds, etc), and all these will be available for inspection and aggregation, eventually.

I like to think that Simal is the semantic-web hybrid of freshmeat and planet. I’d like to be able to build freahmeat-like displays of all recent releases of all the software my software is dependent on, for instance, all their contributor’s blogs aggregated together, or all their high priority bugs. But it’s still very early days yet.

Even though only the most basic of functionality is present in the most limited of ways, we’ve already had some third party participation, which of course, is the key benefit of the “release early, release often” approach to software engineering.

I’ve attached a presentation about Simal based on the presentation I gave at last weeks Google Open Source Jam 3 in London: Simal presentation