6. Unresolved Symbols
The most common and most frustrating failure in loading an LKM is a bunch
of error messages about unresolved symbols, like this:
msdos.o: unresolved symbol fat_date_unix2dos
msdos.o: unresolved symbol fat_add_cluster1
msdos.o: unresolved symbol fat_put_super
... |
There are actually a bunch of different problems that result in this
symptom. In any case, you can get closer to the problem by looking at
/proc/ksyms and confirming that the symbols in the
message are indeed not in the list.
6.1. Some LKMs Prerequire Other LKMs
One reason you get this is because you have not loaded another
LKM that contains instructions or data that your LKM needs to access.
A primary purpose of modprobe is to avoid this
failure. See Section 5.3.
6.2. An LKM Must Match The Base Kernel
The designers of loadable kernel modules realized there would be a
problem with having the kernel in multiple files, possibly distributed
independently of one another. What if the LKM
mydriver.o was written and compiled to work with
the Linux 1.2.1 base kernel, and then someone tried to load it into a
Linux 1.2.2 kernel? What if there was a change between 1.2.1 and
1.2.2 in the way a kernel subroutine that
mydriver.o calls works? These are internal
kernel subroutines, so what's to stop them from changing from one
release to the next? You could end up with a broken kernel.
To address this problem, the creators of LKMs endowed them with a
kernel version number. The special .modinfo
section of the mydriver.o object file in this example has
"1.2.1" in it because it was compiled using header files from Linux
1.2.1. Try to load it into a 1.2.2 kernel and
insmod notices the mismatch and fails,
telling you you have a kernel version mismatch.
But wait. What's the chance that there really is an incompatibility
between Linux 1.2.1 and 1.2.2 that will affect
mydriver.o? mydriver.o only
calls a few subroutines and accesses a few data structures. Surely
they don't change with every minor release. Must we recompile every
LKM against the header files for the particular kernel into which we
want to insert it?
To ease this burden, insmod has a
-f option that "forces"
insmod to ignore the kernel version
mismatch and insert the module anyway. Because it is so unusual for
there to be a significant difference between any two kernel versions,
I recommend you always use -f. You will, however,
still get a warning message about the mismatch. There's no way to
shut that off.
But LKM designers still wanted to address the problem of incompatible
changes that do occasionally happen. So they invented a very clever
way to allow the LKM insertion process to be sensitive to the actual
content of each kernel subroutine the LKM uses. It's called symbol
versioning (or sometimes less clearly, "module versioning."). It's
optional, and you select it when you configure the kernel via the
"CONFIG_MODVERSIONS" kernel configuration option.
When you build a base kernel or LKM with symbol versioning, the
various symbols exported for use by LKMs get defined as macros. The
definition of the macro is the same symbol name plus a hexadecimal
hash value of the parameter and return value types for the subroutine
named by the symbol (based on an analysis by the program
genksyms of the source code for the subroutine).
So let's look at the register_chrdev subroutine.
register_chrdev is a subroutine in the base
kernel that device driver LKMs often call. With symbol versioning,
there is a C macro definition like
#define register_chrdev register_chrdev_Rc8dc8350 |
This macro definition is in effect both in the C source file that
defines register_chrdev and in any C source file
that refers to register_chrdev, so while your
eyes see register_chrdev as you read the code,
the C preprocessor knows that the function is really called
register_chrdev_Rc8dc8350.
What is the meaning of that garbage suffix? It is a hash of the data
types of the parameters and return value of
register_chrdev. No two combinations of
parameter and return value types have the same hash value.
So let's say someone adds a paramater to
register_chrdev between Linux 1.2.1 and Linux
1.2.2. In 1.2.1, register_chrdev is a macro for
register_chrdev_Rc8dc8350, but in 1.2.2, it is a
macro for register_chrdev_R12f8dc01. In
mydriver.o, compiled with Linux 1.2.1 header
files, there is an external reference to
register_chrdev_Rc8dc8350, but there is no such
symbol exported by the 1.2.2 base kernel. Instead, the 1.2.2 base
kernel exports a symbol register_chrdev_R12f8dc01.
So if you try to insmod this 1.2.1 mydriver.o
into this 1.2.2 base kernel, you will fail. And the error message
isn't one about mismatched kernel versions, but simply "unresolved
symbol reference."
As clever as this is, it actually works against you sometimes. The
way genksyms works, it often generates different
hash values for parameter lists that are essentially the same.
And symbol versioning doesn't even guarantee compatibility. It
catches only a small subset of the kinds of changes in the definition
of a function that can make it not backward compatible. If the way
register_chrdev interprets one of its parameters
changes in a non-backward-compatible way, its version suffix won't
change -- the parameter still has the same C type.
And there's no way an option like -f on
insmod can get around this.
So it is generally not wise to use symbol versioning.
Of course, if you have a base kernel that was compiled with symbol
versioning, then you must have all your LKMs compiled likewise, and
vice versa. Otherwise, you're guaranteed to get those "unresolved
symbol reference" errors.
6.3. If You Run Multiple Kernels
Now that we've seen how you often have different versions of an LKM
for different base kernels, the question arises as to what to do
about a system that has multiple kernel versions (i.e. you can
choose a kernel at boot time). You want to make sure that the
LKMs built for Kernel A get inserted when you boot Kernel A, but
the LKMs built for Kernel B get inserted when you boot Kernel B.
In particular, whenever you upgrade your kernel, if you're smart,
you keep both the new kernel and the old kernel on the system
until you're sure the new one works.
The most common way to do this is with the LKM-hunting feature of
modprobe. modprobe
understands the conventional LKM file organization described in
Section 5.6 and loads LKMs from the appropriate
subdirectory depending on the kernel that is running.
You set the uname --release value, which is the
name of the subdirectory in which modprobe looks,
by editing the main kernel makefile when you build the kernel and
setting the VERSION, PATCHLEVEL, SUBLEVEL, and EXTRAVERSION variables
at the top.
6.4. SMP symbols
Besides the checksum mentioned above, the symbol version prefix contains
"smp" if the symbol is defined in or referenced by code that was
built for symmetric multiprocessing (SMP) machines. That means it was
built for use on a system that may have more than one CPU. You choose
whether to build in SMP capability or not via the Linux kernel configuration
process (make config, etc.), to wit with the
CONFIG_SMP configuration option.
So if you use symbol versioning, you will get unresolved symbols if the
base kernel was built with SMP capability and the LKM you're inserting was
not, or vice versa.
If you don't use symbol versioning, never mind.
Note that there's generally no reason to omit SMP capability from a
kernel, even if you have only one CPU. Just because the capability is
there doesn't mean you have to have multiple CPUs. However, there are
some machines on which the SMP-capable kernel will not boot because it
reaches the conclusion that there are zero CPUs!
6.5. You Are Not Licensed To Access The Symbol
The copyright owners of some kernel code license their programs to the
public to make and use copies, but only in restricted ways. For
example, the license may say you may only call your copy of the
program from a program which is similarly licensed to the public.
(Is that confusing? Here's an example: Bob writes an LKM that provides
data compression subroutines to other LKMs. He licenses his program
to the public under the GNU Public License (GPL). According to some
interpretations, that license says if you make a copy of Bob's LKM,
you can't allow Mary's LKM to call its compression subroutines
if Mary does not supply her source code to the world too. The idea is to
encourage Mary to open up her source code).
To support and enforce such a license, the licensor can cause his
program to export symbols under a special name that is the real name
of the symbol plus the prefix "GPLONLY". A naive loader
of a client LKM would not be able to resolve those symbols. Example:
Bob's LKM provides the service bobsService() and declares it to be
a GPL symbol. The LKM consequently exports bobsService() under the
name GPLONLY_bobsService. If Mary's LKM refers to bobsService, the
naive loader will not be able to find it, so will fail to load Mary's
LKM.
However, a modern version of insmod knows to check
for GPLONLY_bobsService if it can't find bobsService. But the modern
insmod will refuse to do so unless Mary's LKM
declares that it is licensed to the public under GPL.
The purpose of this appears to be to prevent anyone from accidentally
violating a license (or from credibly claiming that he accidentally
violated the license). It is not difficult to circumvent the
restriction if you want to.
If you see this failure, it is probably because you're using an old
loader (insmode) that doesn't know about GPLONLY.
The only other cause would be that the LKM author wrote the source
code in such a way that it will never load into any Linux kernel, so
there would be no point in the author distributing it.
6.6. An LKM Must Match Prerequisite LKMs
The same ways an LKM must be compatible with the base kernel, it must
be compatible with any LKMs which it accesses (e.g. the first LKM calls a
subroutine in the second). The preceding sections limit their discussions
to the base kernel just to keep it simple.