RFC 3484 on Linux

Ulrich Drepper
2007-9-4

Creating a network connection between two machines is actually a pretty complicated process. Aside from the actual network transfer there are two crucial decisions to be made right at the beginning:

If the target machine has more than one network address (IPv4, IPv6, or others) a decision must be made which one to use.
If the source machine has more than one network address and/or interface a decision must be made which one to use.

Most sites use site-local IPv4 addresses because there are not enough assigned IP numbers. For those machines which do also have official IP numbers picking the correct number is crucial. In the presence of IPv6 the situation becomes even more complicated. Each interface can have multiple IPv6 addresses. For the implementation of mobile IPv6 and similar special purposes network addresses are temporary and move around, creating temporary addresses and leaving behind deprecated addresses.

Enter RFC 3484

The problem is of course not new. Ever since a machine could have more than one network interface the implementation had to make choices. There was no all-encompassing document describing the requirements, though.

RFC 3484 defines how to select the correct source and destination address for a connection. This has effects in two places:

While setting up a connection in the kernel.
During name lookup.

The connect() implementation in the kernel has to make decisions about the source and destination address before it returns. Calls to getpeername() and getsockname() after the connect() call must return the appropriate information. In case of connection-less communication each sendto() must make the decision.

A less obvious place for select source and destination addresses in in the name lookup functionality. Traditional interfaces (gethostbyname() etc) just returned the found values the way the services (DNS, files, whatever) listed them. Perhaps the values were sorted or randomized. None of these solutions was terribly useful.

With the introduction of the getaddrinfo() interface the possibility arose to change the sorry state. RFC 3484 specifies in which order addresses should be returned in cases there is more than one. Explaining all the details here is not useful, this is what the RFC is about. But in short, the getaddrinfo() implementation determines the sources address for each of the destination addresses (or more correctly: it lets the kernel decide by using connect()). Then it sorts the destination addresses based on the relative merits of the sources/destination pairs.

The details of the sorting require a lot of knowledge of the network implementation. A few details perhaps worth mentioning are:

A destination address for which it is possible to determine a source address should be preferred over a destination address for which no source address can be determined.
If a destination machine has an internal and an external address and the source machine is on the internal network, it is best to use the internal address.

The idea behind all this is that the caller of getaddrinfo() tries to use the list of returned addresses the way the implementation returns them. I.e., the code should look like this:

struct addrinfo *res = NULL;
if (getaddrinfo(hostname, NULL, NULL, &res) == 0)
  {
    struct addrinfo *runp = res;
    while (runp != NULL)
      {
        ... use *runp ...
        runp = runp->ai_next;
      }
    freeaddrinfo(res);
  }

This way the preferred addresses which have the highest probability to work correctly are tried first. Why is this important? Just look at how IPv6 is deployed today. If anything happened, interfaces have link-local IPv6 addresses. Maybe there are some sites which have internal IPv6 networks. Rarely one finds outgoing IPv6 interfaces. But there are nevertheless some progressive sites which list IPv6 addresses in their DNS records. The result: if the application would try to use the advertised IPv6 first the likelihood of a failed connection attempt is high. This might be an instantaneous error but it could also require a timeout. In any case would a connection attempt take longer than it should do.

By sorting the getaddrinfo() results many problems like this can be avoided. Just also remember to always use the AI_ADDRCONFIG flag.

State of RFC 3484

As of this writing, RFC 3484 has not been superseded by any other RFC. This is a problem. RFC 3879 declares site-local IPv6 addresses obsolete. This decision is not popular with many parts of the community. The argumentation was that site-local IPv6 addresses are too hard to use. Given that one of the RFC authors is from MSFT I believe that: almost everything is too complicated for Winblowz people. But I digress. RFC 4193 introduces the new concept of Unique Local IPv6 Unicast Addresses (ULAs). This might work.

But: RFC 3484 explicitly defines how site-local IPv6 addresses should be treated. Simply declaring all special handling for site-local addresses as invalid (as RFC 3879 does) does not change the fact that

implementation of RFC 3484 existed before RFC 3879,
no revision of RFC 3484 which explains how RFC 4193 addresses should be handled (if anything is needed at all).

Because of the first point and because RFC 3879 specifies that the former site-local address range should not be reused many implementations will continue to handle site-local addresses as specified in RFC3484.

There is also RFC 4429 which introduces Optimistic Duplicate Address Detection. The good news is that this RFC references RFC 3484 but it is not entirely clear.

The BIG Problem

There are several problems with the current RFC 3484 specification. Here is the biggest. Consider a typical machine at a larger site (company, university). This can even happen on home networks. Every machine today still has an IPv4 address and because no site gets assigned large blocks anymore the local networks are NATed. I.e., the IPv4 addresses are site-local addresses: 10/8, 172.16/12, or 192.168/16.

At the same time most sites do not have official IPv6 addresses assigned because ISPs still frequently do not give them out. So many sites resort to using site-local or ULA IPv6 addresses. The IPv6 spec does not contain anything similar to NAT for IPv6, for good reason. The extended address space is meant to make address translation unnecessary.

But what happens if on such a machine which has only IPv4 and IPv6 site-local address one looks up a host which has a global IPv4 and IPv6 address? According to the recommended setting in RFC 3484 (i.e., prefer IPv6 addresses) the sorting results in placing the IPv6 address first. Always.

But this is a problem. The IPv6 address is not NATed therefore trying to connect to a global IPv6 address with the site-local address will not work. On the other hand, the IPv4 address usually is NATed and therefore connecting to the global IPv4 address using the site-local address will usually work.

The problem is that the address labelling does not make a difference between global addresses and site-local addresses. For IPv4 this is OK, for now, because NAT is used everywhere. But global and site-local/ULA IPv6 addresses do not match.

A solution for the problem is to assign different labels in rule 5 of the sorting algorithm for site-local and ULA IPv6 addresses. I.e., the table for the labels in section 2.1 gets extended:

      Prefix       Label
      ::1/128          0
      ::/0             1
      2002::/16        2
      ::/96            3
      ::ffff:0:0/96    4
      fec0::/10        5
      fc00::/7         6

This is what is done in glibc. With this change in place the sorting on the problem machine described above always sorts a global IPv4 before a global IPv6 address because the label for the site-local and global IPv4 addresses match while the labels for the site-local and global IPv6 addresses do not match. Another reason to update RFC 3484.

The IPv4 Problem

Rule 9 in RFC 3484 specifies that the longer matching prefix wins a comparison between two addresses of the same family. This makes perfect sense for IPv6. But the IPv4 address space is not allocated hierarchically. Neighboring addresses do not mean that the networks are neighboring, too.

IPv4 addresses can only meaningfully be compared if they are in the same subnet. Therefore glibc implements a modified rule 9:

  If Netmask(DA) == Netmask(Source(DA))
    If CommonPrefixLen(DA, Source(DA)) > CommonPrefixLen(DB, Source(DB))
      Prefer DA
    Else If CommonPrefixLen(DA, Source(DA)) < CommonPrefixLen(DB, Source(DB))
      Prefer DB

State of RFC 3484 in Linux

The kernel has a full-fledged implementation of RFC 3484 for IPv6 addresses at least since the early 2.6 days. Some of the rules are only implemented if certain kernel options are selected. Not every kernel binary needs to have support for mobile IPv6. The only unsupported rule from RFC 3484 (see the RFC text for details) is number 4.

On the userlevel side, the getaddrinfo() implementation in glibc got support for RFC 3484 sorting in November 2003. A few problems have been found since then but in general it worked. Support for rules 3 and 7 was only added in April 2006, support for rule 4 in September 2006. Because the number of sites for which this makes a difference is very low it should not matter much. With the correct kernel and glibc version it is possible to have a complete getaddrinfo implementation.

The implementation nowadays also has support for configuration through the system administrator as required in RFC 3484. Information placed in the /etc/gai.conf file can replace the label and precedence tables the implementation uses by default. This allows to adjust the implementation for local requirements, e.g., the site-local addresses etc. An example configuration can be found in /usr/share/docs/glibc-common-*/gai.conf. It is not installed in /etc by default since the absence of the file indicates that the implementation should use the default. A configuration file for the default RFC 3484 setting would look like this:

       label  ::1/128       0
       label  ::/0          1
       label  2002::/16     2
       label ::/96          3
       label ::ffff:0:0/96  4
       precedence  ::1/128       50
       precedence  ::/0          40
       precedence  2002::/16     30
       precedence ::/96          20
       precedence ::ffff:0:0/96  10

There is no support for anything like RFC 3484 in the obsolete name lookup interfaces (gethostbyname() etc) and there never will be. The implementation of RFC 3484 adds some overhead. To determine the source address for each destination address a socket must be created and a UDP connection initiated. In the absence of a dedicated kernel interface that is the best solution. This work cannot be cached in nscd. The daemon only caches the raw information. The sorting always has to happen because the situation of the machine (the available interfaces etc) can change over time.

A lot of the information used by the sorting algorithm is not easy to come by. It is therefore highly recommended that no program uses home-grown implementations of getaddrinfo(). Just use the system implementation.