6. Unresolved Symbols

The most common and most frustrating failure in loading an LKM is a bunch of error messages about unresolved symbols, like this:
msdos.o: unresolved symbol fat_date_unix2dos msdos.o: unresolved symbol fat_add_cluster1 msdos.o: unresolved symbol fat_put_super ...
There are actually a bunch of different problems that result in this symptom. In any case, you can get closer to the problem by looking at /proc/ksyms and confirming that the symbols in the message are indeed not in the list.

6.1. Some LKMs Prerequire Other LKMs

One reason you get this is because you have not loaded another LKM that contains instructions or data that your LKM needs to access. A primary purpose of modprobe is to avoid this failure. See Section 5.3.

6.2. An LKM Must Match The Base Kernel

The designers of loadable kernel modules realized there would be a problem with having the kernel in multiple files, possibly distributed independently of one another. What if the LKM mydriver.o was written and compiled to work with the Linux 1.2.1 base kernel, and then someone tried to load it into a Linux 1.2.2 kernel? What if there was a change between 1.2.1 and 1.2.2 in the way a kernel subroutine that mydriver.o calls works? These are internal kernel subroutines, so what's to stop them from changing from one release to the next? You could end up with a broken kernel.

To address this problem, the creators of LKMs endowed them with a kernel version number. The special .modinfo section of the mydriver.o object file in this example has "1.2.1" in it because it was compiled using header files from Linux 1.2.1. Try to load it into a 1.2.2 kernel and insmod notices the mismatch and fails, telling you you have a kernel version mismatch.

But wait. What's the chance that there really is an incompatibility between Linux 1.2.1 and 1.2.2 that will affect mydriver.o? mydriver.o only calls a few subroutines and accesses a few data structures. Surely they don't change with every minor release. Must we recompile every LKM against the header files for the particular kernel into which we want to insert it?

To ease this burden, insmod has a -f option that "forces" insmod to ignore the kernel version mismatch and insert the module anyway. Because it is so unusual for there to be a significant difference between any two kernel versions, I recommend you always use -f. You will, however, still get a warning message about the mismatch. There's no way to shut that off.

But LKM designers still wanted to address the problem of incompatible changes that do occasionally happen. So they invented a very clever way to allow the LKM insertion process to be sensitive to the actual content of each kernel subroutine the LKM uses. It's called symbol versioning (or sometimes less clearly, "module versioning."). It's optional, and you select it when you configure the kernel via the "CONFIG_MODVERSIONS" kernel configuration option.

When you build a base kernel or LKM with symbol versioning, the various symbols exported for use by LKMs get defined as macros. The definition of the macro is the same symbol name plus a hexadecimal hash value of the parameter and return value types for the subroutine named by the symbol (based on an analysis by the program genksyms of the source code for the subroutine). So let's look at the register_chrdev subroutine. register_chrdev is a subroutine in the base kernel that device driver LKMs often call. With symbol versioning, there is a C macro definition like

  #define register_chrdev register_chrdev_Rc8dc8350

This macro definition is in effect both in the C source file that defines register_chrdev and in any C source file that refers to register_chrdev, so while your eyes see register_chrdev as you read the code, the C preprocessor knows that the function is really called register_chrdev_Rc8dc8350.

What is the meaning of that garbage suffix? It is a hash of the data types of the parameters and return value of register_chrdev. No two combinations of parameter and return value types have the same hash value.

So let's say someone adds a paramater to register_chrdev between Linux 1.2.1 and Linux 1.2.2. In 1.2.1, register_chrdev is a macro for register_chrdev_Rc8dc8350, but in 1.2.2, it is a macro for register_chrdev_R12f8dc01. In mydriver.o, compiled with Linux 1.2.1 header files, there is an external reference to register_chrdev_Rc8dc8350, but there is no such symbol exported by the 1.2.2 base kernel. Instead, the 1.2.2 base kernel exports a symbol register_chrdev_R12f8dc01.

So if you try to insmod this 1.2.1 mydriver.o into this 1.2.2 base kernel, you will fail. And the error message isn't one about mismatched kernel versions, but simply "unresolved symbol reference."

As clever as this is, it actually works against you sometimes. The way genksyms works, it often generates different hash values for parameter lists that are essentially the same.

And symbol versioning doesn't even guarantee compatibility. It catches only a small subset of the kinds of changes in the definition of a function that can make it not backward compatible. If the way register_chrdev interprets one of its parameters changes in a non-backward-compatible way, its version suffix won't change -- the parameter still has the same C type.

And there's no way an option like -f on insmod can get around this.

So it is generally not wise to use symbol versioning.

Of course, if you have a base kernel that was compiled with symbol versioning, then you must have all your LKMs compiled likewise, and vice versa. Otherwise, you're guaranteed to get those "unresolved symbol reference" errors.

6.3. If You Run Multiple Kernels

Now that we've seen how you often have different versions of an LKM for different base kernels, the question arises as to what to do about a system that has multiple kernel versions (i.e. you can choose a kernel at boot time). You want to make sure that the LKMs built for Kernel A get inserted when you boot Kernel A, but the LKMs built for Kernel B get inserted when you boot Kernel B.

In particular, whenever you upgrade your kernel, if you're smart, you keep both the new kernel and the old kernel on the system until you're sure the new one works.

The most common way to do this is with the LKM-hunting feature of modprobe. modprobe understands the conventional LKM file organization described in Section 5.6 and loads LKMs from the appropriate subdirectory depending on the kernel that is running.

You set the uname --release value, which is the name of the subdirectory in which modprobe looks, by editing the main kernel makefile when you build the kernel and setting the VERSION, PATCHLEVEL, SUBLEVEL, and EXTRAVERSION variables at the top.

6.4. SMP symbols

Besides the checksum mentioned above, the symbol version prefix contains "smp" if the symbol is defined in or referenced by code that was built for symmetric multiprocessing (SMP) machines. That means it was built for use on a system that may have more than one CPU. You choose whether to build in SMP capability or not via the Linux kernel configuration process (make config, etc.), to wit with the CONFIG_SMP configuration option.

So if you use symbol versioning, you will get unresolved symbols if the base kernel was built with SMP capability and the LKM you're inserting was not, or vice versa.

If you don't use symbol versioning, never mind.

Note that there's generally no reason to omit SMP capability from a kernel, even if you have only one CPU. Just because the capability is there doesn't mean you have to have multiple CPUs. However, there are some machines on which the SMP-capable kernel will not boot because it reaches the conclusion that there are zero CPUs!

6.5. You Are Not Licensed To Access The Symbol

The copyright owners of some kernel code license their programs to the public to make and use copies, but only in restricted ways. For example, the license may say you may only call your copy of the program from a program which is similarly licensed to the public.

(Is that confusing? Here's an example: Bob writes an LKM that provides data compression subroutines to other LKMs. He licenses his program to the public under the GNU Public License (GPL). According to some interpretations, that license says if you make a copy of Bob's LKM, you can't allow Mary's LKM to call its compression subroutines if Mary does not supply her source code to the world too. The idea is to encourage Mary to open up her source code).

To support and enforce such a license, the licensor can cause his program to export symbols under a special name that is the real name of the symbol plus the prefix "GPLONLY". A naive loader of a client LKM would not be able to resolve those symbols. Example: Bob's LKM provides the service bobsService() and declares it to be a GPL symbol. The LKM consequently exports bobsService() under the name GPLONLY_bobsService. If Mary's LKM refers to bobsService, the naive loader will not be able to find it, so will fail to load Mary's LKM.

However, a modern version of insmod knows to check for GPLONLY_bobsService if it can't find bobsService. But the modern insmod will refuse to do so unless Mary's LKM declares that it is licensed to the public under GPL.

The purpose of this appears to be to prevent anyone from accidentally violating a license (or from credibly claiming that he accidentally violated the license). It is not difficult to circumvent the restriction if you want to.

If you see this failure, it is probably because you're using an old loader (insmode) that doesn't know about GPLONLY.

The only other cause would be that the LKM author wrote the source code in such a way that it will never load into any Linux kernel, so there would be no point in the author distributing it.

6.6. An LKM Must Match Prerequisite LKMs

The same ways an LKM must be compatible with the base kernel, it must be compatible with any LKMs which it accesses (e.g. the first LKM calls a subroutine in the second). The preceding sections limit their discussions to the base kernel just to keep it simple.