One of the things I've observed since I started working at Red Hat is that even seasoned programmers who are not used to the "open source thing" can become very confused about why their contributions to open source projects are often ignored, rejected or even publicly attacked.
In this tutorial I'll tell you what you can do to help get your contributions into an open source project, and how to avoid the most common mistakes.
Some terms you will need to know:
Don't write lots of code, dump it on the project right at the end and walk away.
Upstream hates this. They are often volunteers with very limited
time, and code is hard to understand, particularly when it comes in
large quantities. If you walk away having done
your
contribution, it's even harder for upstream because there is nowhere
they can ask for help and explanations.
Do discuss your changes early on the upstream mailing lists, before you start writing too much code. Listen to feedback and change your plans if you need to.
You'll often see this principle referred to as
Release early, release often
and it comes from
a
chapter in Eric Raymond's essay The Cathedral and the
Bazaar
. There, Raymond was talking about something slightly
different: early and frequent releases of the Linux kernel. Your
contribution probably isn't quite so grand, but it's still code and it
still benefits from early discussion, frequent, early releases of
code, feedback from upstream, and testing from users.
Don't submit lumps of code such as an alternate definition of a function, or an alternate source file, or (worst of all) a whole replacement tar.gz/zip file of the project.
Upstream developers, remember, don't have much time. They want to know just what has changed. There is a special format used by developers to show what has changed, called a patch.
Learn how to make patches in the Nuts and bolts section of this tutorial below.
Do create and submit patches.
Patches help upstream developers by just showing what you have changed. Existing tools are also designed to use patches, making them especially easy to use.
If you submit a lump of code, then you are forcing the upstream developers to make a patch themselves so they can see what has changed. This is unfair on them, and in fact many won't even bother or indeed have the time to look.
Here's a nice example of a simple patch submitted to an open source project. Notice that there is a clear subject line and explanation of what the patch is doing, and then the patch itself shows just what was changed.
Don't submit huge patches or patches which make several unrelated changes.
In the first place, huge patches are almost as bad as huge lumps of code: hard for overworked, underpaid upstream developers to keep in their heads and understand all at once. Secondly, making unrelated changes in a single patch is bad because perhaps only some of the changes can be accepted and other changes need more work.
There are several tools to help you maintain sets of patches. Find out more about them in the Nuts and bolts section of this tutorial below.
Do split up your patches into a series of small, easy to understand changes, and if your patches are unrelated to each other then put them in totally separate email threads.
Unlike my other advice, there is something of an art to splitting up patches. Perhaps the best way to understand it is to look at some beautiful examples of the art, and there is nowhere better to look than on the Linux kernel mailing list where the brightest and best programmers split conceptually complicated changes into simple series of patches. Here are some examples:
Do make sure the code will compile and run after each patch is applied.
This is important for a couple of reasons: Firstly programmers use a technique called bisection to find which patch in a series causes some bug to happen. If individual patches in the series break things, then this breaks bisection too. Secondly it makes it easier for upstream to apply some but not all of your patches, which is generally better than having them reject the whole lot.
Find links to documentation on the common version control systems in the Nuts and bolts section of this tutorial.
Work with the most recent development version and if the upstream has a public version control system (CVS, Subversion, Git or whatever) always use that and supply patches which cleanly apply against that.
Upstream developers use the development version, and they will certainly need to apply your patch against the development version or to their version control system. If it doesn't apply cleanly, then that is a lot more work for them, and remember they are probably volunteers and definitely short of time.
Development versions can sometimes be quite different from the latest stable release, especially on projects which do rapid development. So make sure you are always following the latest version and releasing your patches against it.
Another thing to add here is that sometimes you will find yourself with patches which haven't been accepted by upstream yet. Perhaps they just need more work or testing. It's important to keep those up to date with the development version, and if necessary you should periodically release a new version of the patch and post that to the upstream mailing lists. (This process is known as rebasing the patch).
Run any tests that come with the project to make sure your patch doesn't break them. Test your patch thoroughly to make sure it really works.
Again, testing is important and hard to do well. It's not nice to push this task on to the upstream developers. You should make sure that your patch works as advertized, and doesn't break any existing code.
Many projects include automated tests. You would typically do
something like make check
or make test
to
run them.
Don't just add your feature to the code.
If you just add the feature to the code, you're effectively asking upstream to update all the documentation, manual pages, web pages and so on, with documentation for your feature. Upstream may even need to add more automated tests to test your feature continues to work in future.
Remember that upstream developers are often volunteers, and writing documentation in any case is hard work.
Do update the documentation and add tests.
Help the upstream developers by making a complete feature. A complete feature includes all the necessary documentation so users will know about it. It will include any automated tests so that we can be sure that future changes won't break the feature.
Find out about the common free software licenses in the Nuts and bolts section of this tutorial.
Do understand the license and if necessary get permission from your employer.
If you contribute during work time (or in some jurisdictions, if you are employed at all) then you may need to ask for permission from your employer before contributing significant changes to open source projects.
Some projects need a Signed-Off-By
added to the patch, or
some other way to attribute the source of a patch back to a particular
person. Take a look to see if the upstream web pages or sources
contain files describing how to contribute patches, and follow their
advice.
If no one commented on your patch at all, or it was accepted but not actually added upstream, then after a suitable amount of time has passed you should bring up the subject again.
Don't just post exactly the same message.
If no one commented on your patch the first time, then you need to rephrase your explanation of the patch, and make sure you didn't break one of the rules above.
Do gently and infrequently remind upstream about your lost patch.
Do accept that some changes will be rejected. Sometimes they just aren't right for the project, or the upstream developers don't want to maintain them.
It's true that some changes won't be accepted by upstream. There can be many reasons for this, but in well run projects it's usually because they don't fit in well with the overall goals of the project. Maybe editing graphics isn't the right feature for that small, fast word processor project. Sometimes it happens for less noble reasons, but for whatever reason, you can't force upstream developers to take your patch.
Nevertheless, open source software gives you a great deal more freedom, and if you really think that your patch is good enough for the project, you generally have two options:
out of treepatches.
An out of tree patch is a patch that you host and continue to update, but which isn't an official part of the upstream project.
Out of tree patches are very common in certain projects (notably the Linux kernel where there are probably thousands of them).
However you must realise that there is a lot of work involved. Potentially any time upstream release a new version, it may break your patch, requiring you to rebase the patch (which can be a lot of work). In addition you may need to supply your users with multiple versions of the patch, one for each different upstream release. This is why it's a good idea to work with upstream to get your patch in there if at all possible. Even splitting your patch and getting parts of it upstream is usually a worthwhile goal.
The nuclear option when it comes to open source is to announce that you're going to fork the project.
Forking the project means that you'll set up your own project and you'll be your own upstream. You can then, of course, add any patches you like to your version.
Forking the project can be a good idea, particularly where the current upstream is dead, unresponsive, or have by their own actions made themselves very unpopular with the other contributors. But check first to see if anyone else is interested in starting a new project. It's better to band together instead of dividing developer attention between several similar projects.
If you fork a project, then be prepared. In particular, don't use the same or a similar name which could create confusion or even legal problems. Do be prepared to invest a large amount of work, at least as much effort as the original upstream (remember this rule if the project you are forking has hundreds of contributors). Do have clear goals which are different from the original upstream. And if possible fork amicably — it's possible that upstream will be happy for you to take the project in a different direction.
Eric Raymond explains forking projects in terms of developer mindshare, which he calls the noosphere, in this essay.
Note that if you are using a version control system, there are usually special commands provided to make patches. See the next section below.
To create a patch for a single file, before you start making any changes back up the original file:
cp file.c file.c.orig
Then you can edit the file, test your changes and so on. When you are happy with the changes, make a patch like this:
diff -u file.c.orig file.c > my-excellent-feature.patch
Notes:
diff -u
. Don't
forget the -u option.
diff -u oldfile newfile
otherwise
you'll end up with what is known as a reversed patch.
To create a patch across multiple files (eg. against some release of the software), unpack the software twice:
tar zxf foozball-1.0.tar.gz mv foozball-1.0 foozball-1.0.orig tar zxf foozball-1.0.tar.gz
This should create two directories like this:
$ ls foozball-1.0.orig/ foozball-1.0/
Make your changes in the second directory (the one which isn't .orig
).
Then to create a patch, come back to the top and do:
diff -ur foozball-1.0.orig foozball-1.0 > four-players.patch
Read the manual page for the diff command for more options.
Most of the time you'll actually be using a version control system such as CVS, Subversion, git, Mercurial, etc. These days there are many of them. Mostly these have commands for generating patches automatically — you just check out the code, make any changes you want, and run a command and the software makes the patch for you. The table below gives some common commands and links to further documentation.
CVS | cvs diff -u |
Manual |
---|---|---|
Subversion | svn diff |
Home page |
Git | git format-patch |
Manual page |
Mercurial | hg diff |
Home page |
Bazaar (bzr) | bzr send |
Manual page, Home page |
The main tool for managing sets of patches is called quilt. There is a good talk about quilt here.
Git has lightweight branches (a branch is a sequence of patches),
git rebase --interactive ...
to reorder/edit/merge/split change sets,
git stash
, and much more.
Mercurial has a
queues
extension.
The most common licenses you will encounter are:
For a fuller list see the Wikipedia page comparison of free software licenses.
The Linux kernel guidelines are similar to what is in this tutorial, and are a must-read if you want to submit code back to the kernel.
The coreutils HACKING file describes how to format and submit patches for the GNU coreutils project. This goes into a lot of detail about using git for submitting patches. (Thanks Jim Meyering).
Original written by Richard Jones. With feedback from Daniel Berrange, Daniel Veillard, Jim Meyering, and Christophe Troestler.