Cultivating your project’s page on Wikipedia

I’m a Wikipedia editor with a couple of hundred edits and I’ve seen a number of people promote themselves on Wikipedia the wrong way. Wrong ways include (but are not limited to):

  1. Cut and pasting advertising material or project descriptions to Wikipedia. This material is of dubious copyright status (hint: there are tools that automatically scan Wikipedia for text that appears elsewhere on the web) and is almost certainly written in the wrong style for Wikipedia.
  2. Rewriting an existing page from scratch. This repudiates the contributions and points of view of previous contributors to the article in favour of the contributions and point of view of a single contributor.
  3. Coping material from partisan documents into the article. Wikipedia strives to be an encyclopedia, which involves being fair, impartial and balanced where ever possible. If you must draw on partisan documents, you need to go out of your way to note their partisanness.
  4. Writing an essay-style article on the benefits of the project. Wikipedia is not a forum for original research or original writing, it strives to be an encyclopedia and has a very specific encyclopedic style. This style is only casually connected to that of an essay.

There’s a barrier to entry for all pages in Wikipedia—the subject of the article must be “notable.” Notability is a flexible concept, but if something is known to only a small number of people, is not written about by third parties or is essentially ephemeral, it may not be notable. Articles about non-notable subjects are quickly removed from Wikipedia, so it is important that the first version of the page address the reputation, impact or fame of the subject to establish notability.

If you’re writing about a standard, discuss any standardisation bodies involvement, numbers of implementations, numbers of deployments, etc. If you’re written about software, talk about numbers of installs, downloads, turnover of companies involved, etc. In all cases mention institutions involved and any organisation or person linked to the project who already has a Wikipedia page, with links to those pages.

10 steps to improving your page on Wikipedia

  1. Get a Wikipedia account. By registering for an account, you show yourself willing to be held accountable for your edits. Other editors will be more willing to help you and you will have access to more pages. Your edits will also attract less attention from the automated bots that patrol Wikipedia for spam, copyright infringements and other badness.
  2. Add your page to categories. Look at the Wikipedia pages for other similar projects and see what categories they are in. If appropriate, add your page to the same categories. Categories are an major tool in navigating Wikipedia.
  3. Examine other pages in the same categories as your page to see what your page might aspire to. These pages are the pages in Wikipedia that are most similar in nature to your page and will give you ideas as to how to improve it.
  4. Add an infoBox (table in the upper right), a standard way of representing tabular information. InfoBoxes are a key way of tabulating information about organisations, places and other features in Wikipedia. Readers expect to see them. The exact content of the infoBox will depend on the categories that your page is in, check other pages in the same category and crib as many fields as apply to your project.
  5. Add links to third party reviews comparisons and news sources. These, and links from the text to particular references, are key to quality articles and article verifiability.
  6. Add disambiguation text. Often the same word or acronym can mean several different things in different fields. Disambiguation text at the start of the article ensure that readers are where they think they are and redirects them if necessary.
  7. Add article content, broken into useful sections and navigated by a table of contents (the wiki generates the table of contents automatically). Browse other pages in the same categories for suitable subject matter for your page.
  8. Make small incremental changes, unless you really know what you’re doing. Small improvements to a page are much easier to judge the quality of than large rewrites. They’re also more likely to engage other editors (rather than merely being reverted) if you break a guideline.
  9. Add links to/from other Wikipedia pages. Links from your page allow readers to explore related concepts in more detail. Links to your page mean more readers reading it.
  10. Encourage other members of your community to contribute. A single-author page will always have problems representing multiple points of view that a multi-author page can mitigate. More authors and editors on a page mean more content, better proof reading and more links.

Exemplar Wikipedia pages can be found on Moodle, LDAP and Oxford University.

If you’ve got specific questions about specific pages you’re interested in, give us a shout at info@oss-watch.ac.uk

XCRI: standard course information

At the recent IWMW, I went to a session on XCRI. Unfortunately I was too busy listening to take detailed notes and the presentation slides don’t appear to be in the web.

XCRI is a new standard for exchanging post-compulsory course information. Universities, further education, adult learning centres, vocational agencies and continuing professional development providers can all publish information about their courses, enabling careers advisers, institutions and government agencies to find the relevant information on courses in order to encourage people to enrol in them.

Previously there was no standard format for such information and the main consumers of it all require it in different forms. UCAS is a major consumer, as are any number of different government schemes aimed at increasing the take-up of educational opportunities and regional development programs aiming to tackle unemployment by retraining and upskilling. Institutions also typically have their own course catalogue of some description too. Keeping all of these in sync, both with each other and with what students of the course actually get taught is a significant challenge.

XCRI is an XML standard similar in nature to Atom: (a) it’s plain XML (for those people who want to keep things simple) with a mapping to RDF (for those wanting generalised knowledge representation); (b) it’s got a small number of tags as possible, and where ever possible those tags reuse definitions widely used elsewhere; (c) a feed is a list of items.

To make publishing XCRI easier, the standard assumes (but doesn’t enforce) that the feed is merely a text file on a webserver representing all an institutions forthcoming courses. This is to explicitly encourage batch export and validation of XCRI from legacy systems, which is expected to the dominant form of generation for most institutions for some time.

XCRI is a new standard, and their website is still under construction, but some of the community members have websites with decent information on XCRI. Indeed the community building around XCRI is very impressive, with support from a wide variety of institutions.

If you’ve got an open source or open development project you’re trying to build a community around, why not join the new community-development mailing list that we at OSS Watch have recently started? Unfortunately, no, we can’t claim the success of XCRI had anything to do with us, but we can certainly answer your questions and give you pointers.

OSI approves new attribution-centric licence

The OSI has approved a new licence with a focus on attribution. Groklaw have a fine-grained analysis.

The key question is whether copyright holders can include a clause in their licence forcing users to display advertising for the copyright holders and the licence still get the official “open source” stamp of approval from the OSI. Such advertising directly undermines the position of any direct competitors to the copyright holders in the marketplace, but it also limits the usefulness of the software in many situations. This issue is most important to commercial companies with business models based on dual-licensing in crowded marketplaces, such as the Customer Relationship Management (CRM) and Content Management Systems (CMS) marketplaces.

Ironically, the SugarCRM licence which Groklaw use as a comparison, may be outdated. I posted yesterday that SugarCRM have announced a move to the GPLv3.

Moodle Moot 2007

Registration is now open for Moodle Moot 2007, the original Moodle gathering. Being held at Milton Keynes for the second year after a move from Oxford, the Moot is 23-25 October. Paper submission is still open.

Moodle is an pedagogically-driven open source Virtual Learning Environment(VLE). With a strong focus on the learning and single-click installation, moodle’s growth is being driven by teachers and in my recent trips to several RSC events, most further education institutions in the UK seemed to be using it.

If you’re doing anything exciting with Moodle, be it technically exciting, pedagogically exciting or socially existing, come to the Moot and tell us about it. Institutions are encouraged to send a pair of people to Moodle Moot, one techie and one teacher (or one geek and one pedagogist, to use the cant).

Register or check out pictures of last time(s).

I’m going again this year. This will be my third (or maybe my fourth ?). I’m hoping I get to present this year rather than being a dogsbody again.

SugarCRM goes GPLv3

Open source content management system SugarCRM is switching to the new version of the GPL, the GPLv3, for the next release of their software.

SugarCRM is a dual-licensed open source application, with both open source versions and propriety versions which have additional functionality, support and other benefits. By releasing the low-end community version for free under the GPLv3, SugarCRM enables users to use the software in small businesses, small projects and non-mission critical roles. When those businesses and projects grow, and when those roles become mission critical, users can upgrade to the propriety versions of the software. The community version is released under the GPLv3, forcing commercial competitors who seek to build competing products on it to release the source-code to any improvements they make, leaving SugarCRM in a privileged position over any such competitors.

The GPLv3 is a third version of Gnu Public Licence released by the Free Software Foundation. The previous two versions have been very popular open source licences.

The acrimonious debate over earlier drafts of the GPLv3 threw up questions of how popular it would be once released, but take-up by high-profile projects such as SugarCRM suggest that the GPLv3 is going to enjoy popular support as the previous versions of the GPL have.

Useful Web2.0 Community Building Tools

In a recent post I described my worries about Web2.0 Social Networking tools like LinkedIn, Facebook, MySpace, BeBo et. al In that post I asserted that in order to be truly useful social networking sites need to open up and share more of their data. I predicted that only those sites that go this way will survive.

In this post I’ll look at an alternative future to the one in which a small number of social network sites win out. In this alternative scenario there is no need for “one site to rule them all”, instead we get a whole range of useful tools that allow communities to build their own social networking sites to support their specific needs.

There are a growing number of Web2.0 Social Networking tools (as opposed to sites) that look really interesting, at least to me. They provide small units of functionality that I can quickly and easily embed in by chosen web desktop in order to create my own social networking “site”. I recognise that some of the current big players in this space are heading in this direction. They provide “applications” that can be plugged into each users version of their site. However, these sites suffer from the “walled garden” problem I discuss in the earlier post. In my alternative scenario the tools are focussed on doing one job, and doing it well (note this is much closer to what the term Web2.0 originally meant than what many people think it means today). Each tool can be seamlessly integrated as and when needed because the user owns the data and the tool is either open source or has an extensive open API (note, I’m not claiming we are there yet, this is a potential future scenario).

Currently on my personal list of “potentially useful” are:

  • Dopplr – know when your schedule takes you to the same place as others in your social network (the bar or dinner table is still the most effective social networking tool)
  • Twitter – took me a while to “get it” but as Brian Kelly recently described it, it is an online “two minute water cooler chat” – lets see if I agree with this view in a few months time
  • del.icio.us – social bookmarking, now if only I could figure out how to mine this resource for good data
  • Google Calendars – it really does save time if people can see your availability without sending another mail

I realise there is nothing particularly new in this list, I’m a follower in this space, not a leader. I provide my list to illustrate what I consider to be potentially useful tools. For each of these tools I believe I may, over time, find a place in my social networking armoury, alongside my current tool set which consists of:

  • email for discussion
  • web sites (with blogs and RSS feeds) for publication
  • a revision control system for document and, where appropriate, source management
  • an issue tracker to manage community objectives, resources and strategy

What makes these new tools stand out from the others in this space? For me the common factors in these new tools are:

  • they provide something email can’t
  • they provide an API that allows me to do what I want with the data (actually Dopplr doesn’t, but RSS and iCal feeds do allow access to the data generated by the tool)
  • they don’t require me to sign up to and navigate a gamut of tools I don’t want, just to get at the one I do want
  • they let me define my own processes for community development and interaction

I’d love to hear from people about social networking tools they find useful, especially if they satisfy my criteria above.
(if anyone is interested in finding me on these tools then it is useful to know I always use the username “rgardler”, except on Google – mail me if you need access to my calendar)

Community and Web 2.0 Social Networking

At the recent International Web Managers Workshop in York I attended a discussion session led by Brian Kelly entitled Building the Web Managers Community. This session allowed participants to discuss web 2.0 social networking tools and their role in supporting a community of web managers in sharing best practice and experiences.

My interest in this session stems from my long history of involvement in sustained open source development using a community led model. In this domain only four tools are needed to facilitate the community:

  • mailing lists (or in some cases forums) – the fewer the better
  • a web site (which may or may not contain a blog but should provide RSS feeds of news)
  • a repository for documentation and, where appropriate, source code
  • an issue tracker to help plan and monitor organised community activity

Despite there being only four tools in this list, a process for their use is vital. None of these tools are of any use without a process describing how the community operates and is governed. None of the communities I have examined in the various social networking sites have such a process, nor such a limited (but highly effective) set of tools at their disposal. This is true of all “social networking” communities I’ve seen, including those set up by people from successful “old web” communities.

Process is critical (note process need only be guidance in the form of best practice, it need not be rules). It is critical because newcomers to the community need to have a very clear understanding of which tools are used for what purpose. Without this process, and without tight control over the proliferation of overlapping tools available to the community, information is spread far and wide. Consequently its exposure, and thus its usefulness, is limited.

For a community to succeed participants must get more out than they put in, so limiting the exposure to useful resources is a major concern.

Since the four tools listed above, coupled with a clear description of their use patterns, have led to the development of highly successful and long lived communities within open source software development. I wonder if we really need more?
It is worth noting that members of those “old web” communities often experiment with social networking tools. I recall myself and many of my Apache Software Foundation colleagues getting all excited over one of the very early social networking tools, Orkut, launched in November 17, 2002. Despite our early enthusiasm, we quickly grew bored of it and returned to our tried and tested tools. Since then I (and many of my peers) have experimented with LinkedIn, MySpace, Facebook, Orkut (again) and a handful of other such, so called, social networking tools. I have to say the functionality, and more importantly, the utility, of these sites has not really changed a great deal since 2002. I still stick to my four base tools within all my real community work.
In addition to the proliferation of tools within these environments I am also highly worried about the fact that these tools are walled gardens. Back in the early days of Orkut there was a hope that it would be the “one and only”, or it would create open standards for all that follow, yet I (and to a large extent we, where the we are existing members of successful online communities) rejected it.

Today it is clear that there are a large number of walled gardens in social networking domain. This is a problem because if I’m not a member of a site, I can’t play with the communities within. Since I don’t want to join and manage up to a dozen different social networking sites I have to pick and choose which I join in with. This is contrary to the a basic principle of a community led development – that everyone is welcome as long as they are willing to play by the “rules”.

If the tools are not fully open with respect to sharing their data across social networks (where permission is granted), then they are no use to my community development work. So again, nothingis compelling me to move away from the tried and tested tools listed above.
This whole “walled garden” mess reminds me of my early days using computers and modems to communicate. First of all I used bulletin boards (reasonably open thanks to Fidonet), then I joined Compuserve (totally closed), but had difficulty communicating with those on AOL, Prestel and others. Then came open standards and the Internet and gradually the walls began to fall. I quickly left Compuserve and switched to newsgroups and email, I’ve never looked back since.

Supporters of online social networking often argue that there are runaway success stories that indicate this is a movement that cannot fail. I disagree. In its day Compuserve, Prestel and many other such networks were a success, but only AOL can be thought of a survivor from these early days. Today AOL is not the walled garden it once was, it has remodelled in order to survive. [I won't even bother pointing at the dot-com bubble and all the "runaway success" stories within it that later bit the dust.]

I predict only one or two of the current Social Networking sites will survive, and they will be the ones that share their network data first.

Report: Web Usage Statistics in the University Environment

This was a well attended workshop in the parallel sessions, and the attendance wasn’t because it was close to the previous coffee break, because it was well hidden in the maze-like Vanbrugh building. This is my report, obvious factual errors are most likely to me mishearing, mistyping, mis understanding. If you spot any errors let me know.

Paul Kelly and William Mackintosh described the process the The University of York went through, looking at alternatives to their current combination of analog and report magic. The current system is stable, has good report writing functionality and is free, but is a “one size fits all” solution giving all units the same reports, is poorly understood by staff and there are doubts about the effort required to maintain the system in the medium to long term.

In selecting alternative solutions to explore both open source and propriety systems were examined. Open source systems were seen as connected with concerns about overhead and maintenance. Two propriety systems, NetStat and Google analytics were selected for more in-depth study.

Both systems used javascript and cookies to report user activity to third party servers, rather than analysis of webserver logs. The appropriate magic for both systems to a range of webservers under their control and compared their performance, both to each other and to the previous system.

The first and most obvious result was a huge result in the number of “hits.” Both systems only count systems that run the javascript on web pages and allow third party cookies, thus excluding: users with conservative cookies settings, users with javascript turned off, users using browsers that don’t support these javascript or cookies (or both) and web crawlers. Examination of the (existing) logs suggested that the majority of the drop was attributable to web crawlers. The impact of web crawlers was increased by the use of an institutional google search appliance.

The old system was not turned off, because there was seen to be a need to continue to generate the old metrics for on-going reporting.

Both systems had path analysis tools, but they could only be used to answer questions (or hypotheses) already posed. For example, in a debate whether units of enrollment should be called “courses” or “programs,” both could be tried and the systems used to determine whether users found the appropriate material without backtracking. It was seen to be important that only one change be made at once. Google calls this “goal analysis.”

The Nedstat representative present mentioned that this idea was taken to the extreme by certain commercial modules such as Mercado and TouchClarity which integrate the statistics engine and CMS to allow dynamic alteration of many page features to optimize for particular goals. Typical alterations might the changing the size, colour and placement of buttons and links.

The javascript and cookies approach had difficulties with non-javascriptable content (i.e. PDFs, word processor files, etc). Local references could have the necessary javascript add to the link, but this doesn’t work for non-local references, because the webmaster is not in control of the link. [maybe mod-rewrite could be used to redirect users to such javascript ?]

Privacy was raised and it was mentioned that google analyitics has a requirement that sites using it have a privacy policy

We went around the room introducing ourselves and outlining what systems we used. Here’s a quick tally of the systems. I’ve counted pretty much every system mentioned in a non-technically-disparaging manner, while attempting to avoid duplication where two people were from the same institution. In the order in which they were first mentioned:

Nedstat 8
Analog 12
Google analytics 15
Evisit Analyst 1
WebTrends 3
Webaliser 1
Site Meter 1
AWstats 2
Netinsight 1
Log Anayser Pro 1

Paul and William would not be drawn on which system York was likely to go with, but seemed confident they would go with one or the other.

Paul and William after the workshop

Paul and William after the workshop

OSS Watch at IWMW

This week, OSS Watch is at IWMW, the Institutional Web Managers Workshop in York. We’re catching up on all things webby, chewing the fat with old friends like Brian Kelly and JISC InfoNET, Ross and Andrew are presenting a paper and I’m taking some photos.

The weather last night left something to be desired, but since the sun burnt the fog off this morning it’s not been too bad.


Image of York central city with the York Minster above, shrouded in fog

Picture of central York early this morning, with the fog-shrouded Minster looming above.

keywords: iwmw, iwmw2007, york

Samba’s Adoption of the GPL v3

On July 9th 2007 the Samba project announced that future releases of their software would be made under the Free Software Foundation’s newly-minted GNU General Public License v3. Previous releases had come under the GPL v2. Most people who have used Linux will be familiar with Samba. It enables Linux, as well as many other varieties of Unix variants, to access and provide file and print services to/by computers running Microsoft’s Windows. To put it another way, it is an implementation of the Microsoft-created SMB/CIFS protocols, created through reverse engineering and reference to publicly available documentation . Samba forms an essential part of the Linux software stack, given how widespread Windows machines are, and this is reflected by its inclusion in nearly all Linux distributions.

Why is this licence-swap a big deal? There are a couple of reasons. Firstly, despite succeeding the most widely-used free and open source software licence (the GPL v2) there remains some doubt over how popular the GPL v3 will be. Some figures in the world of free and open source software – notably Linux creator Linus Torvalds – have expressed doubts about the new licence’s terms. Torvalds has in the past said that he will be keeping the Linux kernel under the GPL v2, as a result of these doubts. Thus ever since the GPL v3′s final publication on 29 June 2007, commentators have been waiting eagerly to see which free and open source projects will move to the new licence and thereby contribute to its success. Samba’s conversion is a big win for the Free Software Foundation and its aim of propagating GPL v3 usage as widely as possible.

There’s another reason that Samba’s announcement is big news, and it relates to the complex story of Microsoft’s patent covenant deal with Novell. Back in November 2006, Novell and Microsoft announced that they had struck a deal to promote Windows/Linux interoperability and support each other’s customers. As part of this deal, Novell and Microsoft promised not to sue each other’s customers over the use of their patents; both Microsoft and Novell hold extensive software patent portfolios. The deal antagonised many in the free and open source community, for a couple of reasons.

Firstly it seemed to have been formulated specifically to avoid an essential provision of the GPL v2. Distributors of GPL v2 software must either distribute the software freely, or not at all. If – for example – a patent judgement forced the distributor to slap an additional ten pence charge on top of the software in order to pay for a patent licence, the GPL v2 says that the distributor would have to stop distributing the software entirely. In this way the GPL v2 discourages patent actions against distributors by making them unlikely to yield any real cash for the patent owners. It also prevents a single distributor becoming ‘favoured’ over others through the granting of a patent licence to them. Now with this customer-aimed cross-licensing deal, Novell and Microsoft had found a way around this provision. Neither company was making direct payment to the other for the use of patents – they were merely making a promise not to take each other’s customers to court. As a result of the deal, Microsoft managed to give the impression that all Linux users – with the exception of Novell’s customers – were illegally making use of Microsoft patented technology and might very well get sued one day. Novell, whose Linux business had been struggling, had managed to give the impression that theirs was the only ‘safe’ Linux available, if you believed their reasoning…

It was this second reason – the designation of some Linux users as safe from Microsoft legal action while others had to live in doubt – that most annoyed the free and open source software community. Some feared that it was the beginning of a new anti-Linux strategy by Microsoft, replacing their failed strategy of criticism and denial. If Microsoft could successfully give the impression to the world that you need a patent licence from them to run Linux (or more accurately a promise or ‘covenant’ not to sue) then they could exercise control over the uptake of Linux, favouring distributors whom they liked or who had something to give in return with covenants while scaring customers away from everyone else’s distributions with implied threats of patent action.

Now at the same time as Novell and Microsoft announced their deal, the Free Software Foundation was grinding through its public consultation process over the rewriting of the GPL v2. This process had originally been scheduled to complete in late 2006, but the opposition to the wording of the original draft had been wider and angrier than expected. By November 2006 it was clear that the timetable had slipped. This tardiness became a blessing in disguise once the true potential of the Microsoft-Novell deal became clear. There was now time to incorporate changes into the new GPL that would broaden its provisions and explicitly prohibit such patent covenants. It would not catch the current Microsoft-Novell deal, but it would close a loophole that many felt might in the future threaten the entire free and open source software eco-system.

Eben Moglen, the Free Software Foundation’s lawyer, redrafted the GPL v3 to cover the more indirect benefits that such deals provide. If x was promising not to sue y‘s customers, the new version said, that counted as payment, and therefore y could either try to get the same benefit for all distributors and their customers worldwide or – if they could not – they would have to stop distributing.

Following all this so far? Good. Now you might think that this was an exercise in closing the stable door after the horse has bolted – after all Novell and Microsoft struck their deal before the GPL v3 came out. It’s not quite that simple though. What happens when Novell want to include newly-released free or open source software in their distribution and it’s licensed under the GPL v3? The simple answer is that they can’t. Their patent deal violates the terms of the GPL v3, so they cannot be distributors of GPL v3 software. How practical is this as a position for a distributor? That depends on which software moves to the GPL v3. If a major component of Linux changed licence, Novell would be forced to break off the deal with Microsoft or drop the component from their distribution…

It’s not really surprising that Jeremy Allison, Samba’s lead developer, was the first to take advantage of Novell’s uncomfortable position. Allison quit his job with Novell last November in protest at the Microsoft deal. By moving Samba to GPL v3, Allison has put Novell – and the other Linux distros who have made similar deals with Microsoft like Linspire, LG and and Xandros – in a dark corner. They can either keep their ‘special’ status or they can include Samba. They cannot do both.

And what of Microsoft? They may be just as keen to drop the deal as their newly-negotiated friends. You see, as part of the original Novell deal, Microsoft bought and distributed ‘support vouchers’ for Novell’s Linux. These vouchers entitle the owner to receive support for Novell’s Linux from Microsoft, including bug fixes. They were intended to cement Novell’s position as Microsoft’s favoured Linux partner. The vouchers are – apparently – undated and have no use-by date. What this means is that Microsoft may already be legally committed to distributing GPL v3 code (or more accurately procuring distribution from Novell, an act also covered by the GPL v3) in the form of patches for Novell’s Linux distribution. Microsoft recently released a statement on this issue in which they argue – somewhat confusingly – that they are not GPL v3 licensees despite having made commitments to support Linux in the form of the vouchers. Microsoft also said that they wont be supporting any GPL v3 code. At the same time Novell issued a statement saying that their Linux distributions will include GPL v3 code regardless of what Microsoft’s statement implies. This new friendship may already be on the rocks, thanks to the Free Software Foundation.

Note: Although I’ve used the word ‘distribution’ throughout this post, the GPL v3 actually replaces the word ‘distribution’ with ‘propagation’, and defines that activity more broadly. For the purposes of clarity, and until we’re all used to the new vocab, I’ve stuck with ‘distribution’ here.