Pegasus InfoCorp: Web site design and web software development company

RAW (4)

Linux IPv4 RAW sockets

SYNOPSIS

    #include <sys/socket.h> #include <netinet/in.h> raw_socket = socket(PF_INET, SOCK_RAW, protocol);

DESCRIPTION

    RAW sockets allow the user to implement own protocols on top of IPv4. All packets or errors matching the protocol specified for the raw socket are passed to this socket. For a list of the allowed protocols see RFC1700 assigned numbers and getprotobyname (3). protocol is in network order. If the protocol is IPPROTO_RAW the IP_HDRINCL option is enabled on the socket and only sending is allowed.

    Only processes with the effective user id 0 or the CAP_NET_RAW attribute set are allowed to open raw sockets.

    The data passed by the user is appended to an IP header (unless the IP_HDRINCL flag is set, then the user has to pass his own IP header) and sent to the specified destination address. Raw sockets use the standard sockaddr_in address structure defined in ip (4). The sin_port field can be used to specify the protocol number, otherwise the protocol specified in the initial socket (2) call is used. For incoming packets sin_port is set to the protocol of the packet.

    When the IP_HDRINCL socket option is enabled on a socket no IP header is generated on sending. The user shall pass its own IP header in front of the packet.

    tab(:) allbox;
    c s
    c s
    l l.
    IP Header fields modified on sending when IP_HDRINCL is specified
    Sending fragments with IP_HDRINCL is not supported currently.
    IP Checksum:Always filled in.
    Source Address:Filled in when zero.
    Packet Id:Filled in when passed as 0.
    Total Length:Always filled in.
    

    If IP_HDRINCL is specified and the IP header has a destination address unequal zero the destination address of the socket is used to route the packet. When MSG_DONTROUTE is specified the destination address must refer to a local interface, otherwise a routing table lookup is done.

    Other IP header options can be set with the usual way of using the standard ip control messages like IP_TTL for the time-to-live field, IP_TOS for the tos field, IP_PKTINFO for the interface and IP_OPTIONS for ip options. See ip (4) for more information. When IP_HDRINCL is set these options are illegal.

    In Linux 2.2 all IP header fields and options can be sent and received using IP control messages. This means raw sockets are only needed for new protocols or protocols with no user interface (like ICMP). Generation of custom TCP or UDP packets using raw sockets is unnecessary in many cases.

    When a packet is received Linux first checks if a raw socket has been bound to the protocol of the packet. If this is true the packet is first passed to the raw socket(s) and then passed to other receivers of this protocol (e.g. kernel protocol modules).

SOCKET OPTIONS

    Raw socket options can be set with getsockopt and read with getsockopt by passing the SOL_RAW family flag.

    ICMP_FILTER Enable a special filter for raw sockets bound to the IPPROTO_ICMP protocol. The passed value long word mask with the bits representing the ICMP types. All incoming ICMP messages with a type equal to a bit number set in this mask are not passed to the socket. This can be used to filter out uninteresting ICMP messages. The default is to pass all ICMP messages.

    Additionally raw sockets support all ip (4) SOL_IP socket flags. One SOL_IP socket flag specific to raw sockets is the IP_HDRINCL flag. When this flag is enabled the user has to pass his own IP header. Linux does not change this IP header in any way.

NOTES

    Linux never changes headers passed from the user except for filling in some zeroed fields as described for IP_HDRINCL. This differs from many other BSD sockets implementations.

    Raw sockets fragment a packet when its total length exceeds the interface MTU. A better more network friendly alternative is to use path MTU discovery. If the IP_PMTU_DISCOVER option is enabled the network stack automatically saves the MTU of targets that have been sucessfully communicated with in the routing cache. When Path MTU is in progress packets may be dropped when the initial MTU guess was too large. The application has to do its own retransmit strategy to handle this situation (but of course packets may be always dropped for other reasons too, so this has be handled anyways). When the socket is connected to a specific peer with connect (2) the path mtu can be retrieved conveniently using the IP_MTU socket option after a EMSGSIZE error occurred. For connectionless sockets with many destinations the new MTU can be accessed using the error queue (see IP_RECVERR in ip (4)). The application should lower its packet sizes then.