Main
2018-03-18

Linux Raw Sockets

Recently I did a userspace implementation of the Host Identity Protokoll (HIPv2, RFC 7401) with the upcoming Diet Exchange (HIP DEX, IETF draft 6). Doing so, I've learnt a lot about raw socktet programing under Linux and here I want to share a few things with you.

So, I assume you have already worked with network sockets before – if not, don't fear, it's not that hard and there are plenty of nice introductions out there. I can for example recommend Beej's Guide to Network Programming. For this article I'll start with a normal UDP/TCP based socket and work my way down the layers. So we open a traditional socket by:

sockfd = socket(AF_INET, SOCK_DGRAM, 0);

This will open a UDP based datagram socket via IPv4. The first argument of socket() specifies the domain of your socket in our case that's Internet Protocol. Sometimes you will see here AF… and sometimes PF…, this doesn't matter, they are the same. While PF stands for protocol family, AF is short for address family. Historically it was thought that in the future there might be multiple protocol families sharing the same address family – but this never happend. So the correct way would be to use PF_INET in the socket call and AF_INET in your struct sockaddr_in, but most people nowadays use the address family everywhere. With the second argument type we specify if we want to use a connection-based protocol like TCP (SOCK_STREAM) or a protocol without connections like UDP (SOCK_DGRAM). The third argument protocol specifies which protocol we actually want to use – we could set UDP or TCP here (IPPROTO_UDP, IPPROTO_TCP) but setting 0 works too: this sets the protocol to the default protocol for the combination of the domain and type field – for AF_INET and SOCK_DGRAM the default is UDP and for SOCK_STREAM it's TCP. You might also see IPPROTO_IP as protocol which is simply by definition 0. But the above variant seems to be the most common one.

But hey, we have the year 2018 – why the heck should be limit us to IPv4? Luckily it's easy enough to support IPv6: just replace AF_INET by AF_INET6 and it will work with both IPv4 and IPv6! So don't you dare to ever use AF_INET anymore without a good excuse. By the way: if you want IPv6 only you can set the socket option IPV6_V6ONLY.

But we don't want to talk about ordinary TCP/UDP sockets here! So lets dig down in the mysterious world of raw sockets.

The first thing I want to note is: you'll need super user rights for creating a raw socket or more precisely the CAP_NET_RAW capability otherwise you'll get the error ”Operation not permitted.” (EPERM).

sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_UDP);
sockfd = socket(AF_INET6, SOCK_RAW, IPPROTO_UDP);

The first kind of Raw-Socket we look at is what you get by setting type to SOCK_RAW but still set protocol to TCP or UDP. You will still only receive the type of packet specified (here UDP), but this time you will not only receive the data but also the layer 4 (TCP/UDP) header and you're also responsible to set the layer 4 header yourself.

Contrary to above, here the choice of domain does matter a lot. First of all here AF_INET6 will only receive IPv6 and not both! Second what you get if you read from the socket differs: if you read from the first variant with AF_INET you will get the IPv4 header, the UDP/TCP header and the data; in the second variant your read will instead result in only the UDP/TCP header and data but not the IPv6-Header!

The third important difference between AF_INET and AF_INET6 for raw sockets is the endianness: unlike IPv4 raw sockets, all data sent via IPv6 raw sockets must be in the network byte order and all data received via raw sockets will be in the network byte order.

If you want to send something through the socket, your packet has to include the Layer 4-Header but not the IP-Header. (Note: this is unspecified in POSIX, but I focus on Linux here.) So but what if we want to change something in the IP-Header? For IPv4 there are two options: you can set the desired field(s) via calls to setsockopt or if you want to do the full header on your own, you can use the socket option IP_HDRINCL to tell that you will construct the header and write both header and payload to the socket:

sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_UDP);
int on = 1;
setsockopt(sockfd, IPPROTO_IP, IP_HDRINCL, &on, sizeof(on));

Even if you use this you won't have to deal with Source Address and Packet ID – the kernel will fill them in for you if you leave them all zero. The fields for the IP checksum and the length field will be set by the kernel if you want or not.

What's important here: IPv6 doesn't have IP_HDRINCL or a direct equivalent, as per RFC 3542 section 3. You can, however, also set various parameters via setsocketopt. Alternatively the IPv6 advanced socket API employs another framework called “ancillary data”. For outgoing packages one can set the majority of the fields in the header as well as supported header extensions via ancillary data and for received packages the majority of the fields and header extensions can be read with the same framework. A description of ancillary data is out of the scope of this article but the basic idea is you specify which values you want to set via a call of setsockopt then you write the value for the header fields and the actual data into a struct msghdr and send this via sendmsg().

If you want to send data with a transport protocol which has no user interface you can set the protocol field to raw too:

sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);

This will automatically set IP_HDRINCL and allow you to send your data with arbitrary layer 4 protocols. Most commons use: sending ICMP packets. Receiving of data is however not possible with this type of socket!

So far we got full control over layer 4 and partial control over layer 3. It's time to step down one further level into the dungeon.

sockfd = socket(AF_PACKET, SOCK_DGRAM, htons(ETHERTYPE_IPV6));

This is called a packet socket, it allows you to receive and send raw packets at the device driver level (layer 2). In the above version we used the protocol to specify that we only want to receive IPv6 packets. We can drop this requirement to receive all packets no matter if it's IPv4, IPv6 or something else:

sockfd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_ALL));

By default, a packet socket will receive all packets matching the protocol. You can use bind() to bind the packet socket to an interface.

The field type set to SOCK_DGRAM results in the cooked mode: when reading from the socket you will read the packet without MAC-header but you can get the MAC-addresses comfortable by using recvfrom() and likewise you can use the sendto() to specify the destination by the sockaddr_ll struct. Alternatively we can set type to SOCK_RAW:

sockfd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));

This is the lowest we can get: this way ethernet frames are passed from the device driver without any changes to your application, including the full level 2 header. Likewise, when writing to the socket the user-supplied buffer hast to contain all the headers of layer 2 to 4.

This is the deepest we can go in userspace – at this point we have full control of the complete ethernet frame. I hope you enjoyed our journey into the rabbit hole.


Sources and further readings:

Tags: linux c network