How to get your code into an open source project

One of the things I've observed since I started working at Red Hat is that even seasoned programmers who are not used to the "open source thing" can become very confused about why their contributions to open source projects are often ignored, rejected or even publicly attacked.

In this tutorial I'll tell you what you can do to help get your contributions into an open source project, and how to avoid the most common mistakes.

Table of contents
1. Discuss your changes early
2. Submit patches, not lumps of code
3. Split your patches up
4. Patch against the latest development version
5. Run any tests
6. Update documentation and tests
7. Legal issues
8. Lost patches
9. Accept rejection
10. Nuts and bolts, and external resources

Some terms you will need to know:

upstream: This is the catch-all term for the ultimate source of the project. It's the core group of contributors, their mailing lists, website and so on. For example, while many companies distribute the Apache webserver, there is only one upstream, here at apache.org.
contributor: That's you, and anyone else contributing changes, bug fixes, clean ups and enhancements to upstream.

Discuss your changes early

Don't write lots of code, dump it on the project right at the end and walk away.

Upstream hates this. They are often volunteers with very limited time, and code is hard to understand, particularly when it comes in large quantities. If you walk away having done your contribution, it's even harder for upstream because there is nowhere they can ask for help and explanations.

Do discuss your changes early on the upstream mailing lists, before you start writing too much code. Listen to feedback and change your plans if you need to.

You'll often see this principle referred to as Release early, release often and it comes from a chapter in Eric Raymond's essay The Cathedral and the Bazaar. There, Raymond was talking about something slightly different: early and frequent releases of the Linux kernel. Your contribution probably isn't quite so grand, but it's still code and it still benefits from early discussion, frequent, early releases of code, feedback from upstream, and testing from users.

Submit patches, not lumps of code

Don't submit lumps of code such as an alternate definition of a function, or an alternate source file, or (worst of all) a whole replacement tar.gz/zip file of the project.

Upstream developers, remember, don't have much time. They want to know just what has changed. There is a special format used by developers to show what has changed, called a patch.

Learn how to make patches in the Nuts and bolts section of this tutorial below.

Do create and submit patches.

Patches help upstream developers by just showing what you have changed. Existing tools are also designed to use patches, making them especially easy to use.

If you submit a lump of code, then you are forcing the upstream developers to make a patch themselves so they can see what has changed. This is unfair on them, and in fact many won't even bother or indeed have the time to look.

Here's a nice example of a simple patch submitted to an open source project. Notice that there is a clear subject line and explanation of what the patch is doing, and then the patch itself shows just what was changed.

Split your patches up

Don't submit huge patches or patches which make several unrelated changes.

In the first place, huge patches are almost as bad as huge lumps of code: hard for overworked, underpaid upstream developers to keep in their heads and understand all at once. Secondly, making unrelated changes in a single patch is bad because perhaps only some of the changes can be accepted and other changes need more work.

There are several tools to help you maintain sets of patches. Find out more about them in the Nuts and bolts section of this tutorial below.

Do split up your patches into a series of small, easy to understand changes, and if your patches are unrelated to each other then put them in totally separate email threads.

Unlike my other advice, there is something of an art to splitting up patches. Perhaps the best way to understand it is to look at some beautiful examples of the art, and there is nowhere better to look than on the Linux kernel mailing list where the brightest and best programmers split conceptually complicated changes into simple series of patches. Here are some examples:

Linux KVM (native virtualization feature) 13 part patchset by Avi Kivity et al.

Do make sure the code will compile and run after each patch is applied.

This is important for a couple of reasons: Firstly programmers use a technique called bisection to find which patch in a series causes some bug to happen. If individual patches in the series break things, then this breaks bisection too. Secondly it makes it easier for upstream to apply some but not all of your patches, which is generally better than having them reject the whole lot.

Patch against the latest development version

Find links to documentation on the common version control systems in the Nuts and bolts section of this tutorial.

Work with the most recent development version and if the upstream has a public version control system (CVS, Subversion, Git or whatever) always use that and supply patches which cleanly apply against that.

Upstream developers use the development version, and they will certainly need to apply your patch against the development version or to their version control system. If it doesn't apply cleanly, then that is a lot more work for them, and remember they are probably volunteers and definitely short of time.

Development versions can sometimes be quite different from the latest stable release, especially on projects which do rapid development. So make sure you are always following the latest version and releasing your patches against it.

Another thing to add here is that sometimes you will find yourself with patches which haven't been accepted by upstream yet. Perhaps they just need more work or testing. It's important to keep those up to date with the development version, and if necessary you should periodically release a new version of the patch and post that to the upstream mailing lists. (This process is known as rebasing the patch).

Run any tests

Run any tests that come with the project to make sure your patch doesn't break them. Test your patch thoroughly to make sure it really works.

Again, testing is important and hard to do well. It's not nice to push this task on to the upstream developers. You should make sure that your patch works as advertized, and doesn't break any existing code.

Many projects include automated tests. You would typically do something like make check or make test to run them.

Update documentation and tests

Don't just add your feature to the code.

If you just add the feature to the code, you're effectively asking upstream to update all the documentation, manual pages, web pages and so on, with documentation for your feature. Upstream may even need to add more automated tests to test your feature continues to work in future.

Remember that upstream developers are often volunteers, and writing documentation in any case is hard work.

Do update the documentation and add tests.

Help the upstream developers by making a complete feature. A complete feature includes all the necessary documentation so users will know about it. It will include any automated tests so that we can be sure that future changes won't break the feature.

Legal issues

Find out about the common free software licenses in the Nuts and bolts section of this tutorial.

Do understand the license and if necessary get permission from your employer.

If you contribute during work time (or in some jurisdictions, if you are employed at all) then you may need to ask for permission from your employer before contributing significant changes to open source projects.

Some projects need a Signed-Off-By added to the patch, or some other way to attribute the source of a patch back to a particular person. Take a look to see if the upstream web pages or sources contain files describing how to contribute patches, and follow their advice.

Lost patches

If no one commented on your patch at all, or it was accepted but not actually added upstream, then after a suitable amount of time has passed you should bring up the subject again.

Don't just post exactly the same message.

If no one commented on your patch the first time, then you need to rephrase your explanation of the patch, and make sure you didn't break one of the rules above.

Do gently and infrequently remind upstream about your lost patch.

Accept rejection

Do accept that some changes will be rejected. Sometimes they just aren't right for the project, or the upstream developers don't want to maintain them.

It's true that some changes won't be accepted by upstream. There can be many reasons for this, but in well run projects it's usually because they don't fit in well with the overall goals of the project. Maybe editing graphics isn't the right feature for that small, fast word processor project. Sometimes it happens for less noble reasons, but for whatever reason, you can't force upstream developers to take your patch.

Nevertheless, open source software gives you a great deal more freedom, and if you really think that your patch is good enough for the project, you generally have two options:

Maintain what are known as out of tree patches.
Fork the project.

Out of tree patches

An out of tree patch is a patch that you host and continue to update, but which isn't an official part of the upstream project.

Out of tree patches are very common in certain projects (notably the Linux kernel where there are probably thousands of them).

However you must realise that there is a lot of work involved. Potentially any time upstream release a new version, it may break your patch, requiring you to rebase the patch (which can be a lot of work). In addition you may need to supply your users with multiple versions of the patch, one for each different upstream release. This is why it's a good idea to work with upstream to get your patch in there if at all possible. Even splitting your patch and getting parts of it upstream is usually a worthwhile goal.

Forking the project

The nuclear option when it comes to open source is to announce that you're going to fork the project.

Forking the project means that you'll set up your own project and you'll be your own upstream. You can then, of course, add any patches you like to your version.

Forking the project can be a good idea, particularly where the current upstream is dead, unresponsive, or have by their own actions made themselves very unpopular with the other contributors. But check first to see if anyone else is interested in starting a new project. It's better to band together instead of dividing developer attention between several similar projects.

If you fork a project, then be prepared. In particular, don't use the same or a similar name which could create confusion or even legal problems. Do be prepared to invest a large amount of work, at least as much effort as the original upstream (remember this rule if the project you are forking has hundreds of contributors). Do have clear goals which are different from the original upstream. And if possible fork amicably — it's possible that upstream will be happy for you to take the project in a different direction.

Eric Raymond explains forking projects in terms of developer mindshare, which he calls the noosphere, in this essay.

Nuts and bolts, and external resources

Creating patches

Note that if you are using a version control system, there are usually special commands provided to make patches. See the next section below.

To create a patch for a single file, before you start making any changes back up the original file:

cp file.c file.c.orig

Then you can edit the file, test your changes and so on. When you are happy with the changes, make a patch like this:

diff -u file.c.orig file.c > my-excellent-feature.patch

Notes:

Always use the unified diff format, diff -u. Don't forget the -u option.
Always do diff -u oldfile newfile otherwise you'll end up with what is known as a reversed patch.

To create a patch across multiple files (eg. against some release of the software), unpack the software twice:

tar zxf foozball-1.0.tar.gz
mv foozball-1.0 foozball-1.0.orig
tar zxf foozball-1.0.tar.gz

This should create two directories like this:

$ ls
foozball-1.0.orig/
foozball-1.0/

Make your changes in the second directory (the one which isn't .orig). Then to create a patch, come back to the top and do:

diff -ur foozball-1.0.orig foozball-1.0 > four-players.patch

Read the manual page for the diff command for more options.

Version control systems

Most of the time you'll actually be using a version control system such as CVS, Subversion, git, Mercurial, etc. These days there are many of them. Mostly these have commands for generating patches automatically — you just check out the code, make any changes you want, and run a command and the software makes the patch for you. The table below gives some common commands and links to further documentation.

CVS	`cvs diff -u`	Manual
Subversion	`svn diff`	Home page
Git	`git format-patch`	Manual page
Mercurial	`hg diff`	Home page
Bazaar (bzr)	`bzr send`	Manual page, Home page

Tools for managing sets of patches

The main tool for managing sets of patches is called quilt. There is a good talk about quilt here.

Git has lightweight branches (a branch is a sequence of patches), git rebase --interactive ... to reorder/edit/merge/split change sets, git stash, and much more.

Mercurial has a queues extension.

Software licenses

The most common licenses you will encounter are:

the GNU General Public License (GPL) version 2 or version 3,
the GNU Lesser General Public License (LGPL) version 2 or version 3,
and the various BSD-style licenses.

For a fuller list see the Wikipedia page comparison of free software licenses.

Other guidelines on submitting patches

The Linux kernel guidelines are similar to what is in this tutorial, and are a must-read if you want to submit code back to the kernel.

The coreutils HACKING file describes how to format and submit patches for the GNU coreutils project. This goes into a lot of detail about using git for submitting patches. (Thanks Jim Meyering).

Authors

Original written by Richard Jones. With feedback from Daniel Berrange, Daniel Veillard, Jim Meyering, and Christophe Troestler.

rjones AT redhat DOT com